中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Subject: Computer Science (provided by Thomson Reuters)
Title:
基于OpenCL的连续数据无关访存密集型函数并行与优化研究
Alternative Title: parallelism and research on functions with continuously independent data and intensive memory access using opencl
Author: 蒋丽媛 ; 张云泉 ; 龙国平 ; 贾海鹏
Keyword: GPU ; OpenCL ; 向量化 ; ROI
Source: 计算机科学
Issued Date: 2013
Volume: 40, Issue:3, Pages:111-115
Indexed Type: CNKI ; WANFANG ; CSCD
Department: 中国科学院软件研究所并行软件与计算科学实验室;中国科学院研究生院;中国科学院软件研究所计算机科学国家重点实验室;中国海洋大学信息科学与工程学院;
Sponsorship: 国家自然科学基金资助项目(60303020,60533020),国家自然科学基金资助重点项目(60503020),国家自然科学基金青年基金课题(61100072)|国家“863”计划基金资助项目(2012AA010902)|ISCAS-AMD联合fusion软件中心资助
Abstract: 连续的数据无关是指计算目标矩阵连续的元素时使用的源矩阵元素之间没有关系且也为连续的,访存密集型是指函数的计算量较小,但是有大量的数据传输操作。在OpenCL框架下,以bitwise函数为例,研究和实现了连续数据无关访存密集型函数在GPU平台上的并行与优化。在考察向量化、线程组织方式和指令选择优化等多个优化角度在不同的GPU硬件平台上对性能的影响之后,实现了这个函数的跨平台性能移植。实验结果表明,在不考虑数据传输的前提下,优化后的函数与这个函数在OpenCV库中的CPU版本相比,在AMD HD 5850GPU达到了平均40倍的性能加速比;在AMD HD 7970GPU达到了平均90倍的性能加速比;在NVIDIA Tesla C2050GPU上达到了平均60倍的性能加速比;同时,与这个函数在OpenCV库中的CUDA实现相比,在NVIDIA Tesla C2050平台上也达到了1.5倍的性能加速。
English Abstract: Continuously independent data type means when calculating the continuous elements of destination matrix, the used elements of source matrices are also continuous and there are no relationship among them. Intensive memory access function is the function that has less computation but a lot of data transfer operations. This paper took the bitwise function as the example, studied and implemented the parallel and the optimizing methods of the continuously independent data and intensive memory access function on GPU platforms. Based on the OpenCL framework, this paper studied and compared various optimizing methods,such as vectorizing, threads organizing, and instruction selecting, and finally used these methods to implement the cross-platform transfer of the bitwise function among different platforms. The study tested the function's execution time without data transfer both on AMD GPU and NVIDIA GPU platforms. On the AMD Radeon HD 5850 platform, the performance has reached 40 times faster than the CPU version in OpenCV library, 90 times faster on AMD Radeon HD 7970 platform, and 60 times faster on NVIDIA GPU Tesla C2050 platform. On NVIDIA GPU Tesla C2050 platform, the speedup is 1.5 comparing with the CUDA version in OpenCV library.
Language: 中文
Citation statistics:
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/15559
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
蒋丽媛,张云泉,龙国平,等. 基于OpenCL的连续数据无关访存密集型函数并行与优化研究[J]. 计算机科学,2013-01-01,40(3):111-115.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[蒋丽媛]'s Articles
[张云泉]'s Articles
[龙国平]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[蒋丽媛]‘s Articles
[张云泉]‘s Articles
[龙国平]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace