ISCAS OpenIR
StreamScan: Fast scan algorithms for GPUs without global barrier synchronization
Yan, Shengen (1); Long, Guoping (1); Zhang, Yunquan (1); Yan, S.(yanshengen@gmail.com)
2013
会议名称18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2013
页码229-238
会议日期February 23, 2013 - February 27, 2013
会议地点Shenzhen, China
收录类别SCI ; EI
出版地Association for Computing Machinery, General Post Office, P.O. Box 30777, NY 10087-0777, United States
ISSN0362-1340
ISBN9781450319225
部门归属(1) Lab. of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing, China; (2) State Key Laboratory of Computing Science, Chinese Academy of Sciences, Beijing, China; (3) Graduate University, Chinese Academy of Sciences, Beijing, China
摘要Scan (also known as prefix sum) is a very useful primitive for various important parallel algorithms, such as sort, BFS, SpMV, compaction and so on. Current state of the art of GPU based scan implementation consists of three consecutive Reduce-Scan-Scan phases. This approach requires at least two global barriers and 3N (N is the problem size) global memory accesses. In this paper we propose StreamScan, a novel approach to implement scan on GPUs with only one computation phase. The main idea is to restrict synchronization to only adjacent workgroups, and thereby eliminating global barrier synchronization completely. The new approach requires only 2N global memory accesses and just one kernel invocation. On top of this we propose two important op-timizations to further boost performance speedups, namely thread grouping to eliminate unnecessary local barriers, and register optimization to expand the on chip problem size. We designed an auto-tuning framework to search the parameter space automatically to generate highly optimized codes for both AMD and Nvidia GPUs. We implemented our technique with OpenCL. Compared with previous fast scan implementations, experimental results not only show promising performance speedups, but also reveal dramatic different optimization tradeoffs between Nvidia and AMD GPU platforms. © 2013 ACM.; Scan (also known as prefix sum) is a very useful primitive for various important parallel algorithms, such as sort, BFS, SpMV, compaction and so on. Current state of the art of GPU based scan implementation consists of three consecutive Reduce-Scan-Scan phases. This approach requires at least two global barriers and 3N (N is the problem size) global memory accesses. In this paper we propose StreamScan, a novel approach to implement scan on GPUs with only one computation phase. The main idea is to restrict synchronization to only adjacent workgroups, and thereby eliminating global barrier synchronization completely. The new approach requires only 2N global memory accesses and just one kernel invocation. On top of this we propose two important op-timizations to further boost performance speedups, namely thread grouping to eliminate unnecessary local barriers, and register optimization to expand the on chip problem size. We designed an auto-tuning framework to search the parameter space automatically to generate highly optimized codes for both AMD and Nvidia GPUs. We implemented our technique with OpenCL. Compared with previous fast scan implementations, experimental results not only show promising performance speedups, but also reveal dramatic different optimization tradeoffs between Nvidia and AMD GPU platforms. © 2013 ACM.
关键词Scan Prefix-sum Opencl Cuda Gpu Parallel Algorithms
语种英语
WOS记录号WOS:000324158900022
引用统计
被引频次:55[WOS]   [WOS记录]     [WOS相关记录]
内容类型会议论文
URI标识http://ir.iscas.ac.cn/handle/311060/16554
专题中国科学院软件研究所
通讯作者Yan, S.(yanshengen@gmail.com)
推荐引用方式
GB/T 7714
Yan, Shengen ,Long, Guoping ,Zhang, Yunquan ,et al. StreamScan: Fast scan algorithms for GPUs without global barrier synchronization[C]. Association for Computing Machinery, General Post Office, P.O. Box 30777, NY 10087-0777, United States,2013:229-238.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Yan, Shengen (1)]的文章
[Long, Guoping (1)]的文章
[Zhang, Yunquan (1)]的文章
百度学术
百度学术中相似的文章
[Yan, Shengen (1)]的文章
[Long, Guoping (1)]的文章
[Zhang, Yunquan (1)]的文章
必应学术
必应学术中相似的文章
[Yan, Shengen (1)]的文章
[Long, Guoping (1)]的文章
[Zhang, Yunquan (1)]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。