ISCAS OpenIR
automatic fft performance tuning on opencl gpus
Li Yan; Zhang Yunquan; Jia Haipeng; Long Guoping; Wang Ke
2011
会议名称2011 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2011
会议录名称Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
页码228-235
会议日期December 7, 2011 - December 9, 2011
会议地点Tainan, Taiwan
收录类别EI
ISSN1521-9097
ISBN9780769545769
部门归属(1) Laboratory of Parallel Software and Computational Science Institute of Software Chinese Academy of Sciences Beijing China; (2) State Key Lab. of Computer Science Institute of Software Chinese Academy of Sciences Beijing China; (3) Chinese Academy of Sciences Graduate University Beijing China; (4) School of Information Science and Engineering Ocean University Qingdao China
摘要School of Information Science and Engineering, Ocean University of China, Qingdao, China Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to programmers. However, the complexity of GPU programming poses a significant challenge for programmers. In this paper, based on the Kronecker product form multi-dimensional FFTs, we propose an automatic performance tuning framework for various OpenCL GPUs. Several key techniques of GPU programming on AMD and NVIDIA GPUs are also identified. Our OpenCL FFT library achieves up to 1.5 to 4 times, 1.5 to 40 times and 1.4 times the performance of clAmdFft 1.0 for 1D, 2D and 3D FFT respectively on an AMD GPU, and the overall performance is within 90% of CUFFT 4.0 on two NVIDIA GPUs. © 2011 IEEE.; School of Information Science and Engineering, Ocean University of China, Qingdao, China Many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, have been revolutionized by Fourier methods. The fast Fourier transform (FFT) is an efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to programmers. However, the complexity of GPU programming poses a significant challenge for programmers. In this paper, based on the Kronecker product form multi-dimensional FFTs, we propose an automatic performance tuning framework for various OpenCL GPUs. Several key techniques of GPU programming on AMD and NVIDIA GPUs are also identified. Our OpenCL FFT library achieves up to 1.5 to 4 times, 1.5 to 40 times and 1.4 times the performance of clAmdFft 1.0 for 1D, 2D and 3D FFT respectively on an AMD GPU, and the overall performance is within 90% of CUFFT 4.0 on two NVIDIA GPUs. © 2011 IEEE.
关键词Algorithms Computer Systems Discrete Fourier Transforms Fast Fourier Transforms Medical Imaging
主办者National Cheng Kung University; National Science Council; Ministry of Education; Academia Sinica; National Center for High Performance Computing
语种英语
内容类型会议论文
URI标识http://ir.iscas.ac.cn/handle/311060/16294
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Li Yan,Zhang Yunquan,Jia Haipeng,et al. automatic fft performance tuning on opencl gpus[C],2011:228-235.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Li Yan]的文章
[Zhang Yunquan]的文章
[Jia Haipeng]的文章
百度学术
百度学术中相似的文章
[Li Yan]的文章
[Zhang Yunquan]的文章
[Jia Haipeng]的文章
必应学术
必应学术中相似的文章
[Li Yan]的文章
[Zhang Yunquan]的文章
[Jia Haipeng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。