中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Title:
异构平台上性能自适应FFT框架
Alternative Title: An automatic performance tuning framework for FFT on heterogenous platforms
Author: 李焱 ; 张云泉
Corresponding Author: Zhang, Y.(zyq@ict.ac.cn)
Keyword: 快速傅里叶变换 ; 自适应性能优化 ; 加速处理器 ; 图形处理器 ; 异构 ; fast Fourier transform (FFT) ; auto-tuning performance ; accelerated processing unit (APU) ; graphic processing unit (GPU) ; heterogenous
Source: 计算机研究与发展
Issued Date: 2014
Volume: 51, Issue:3, Pages:637-649
Indexed Type: EI ; CSCD
Department: 并行软件与计算科学实验室(中国科学院软件研究所)北京 100190;中国科学院大学 北京100049 计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京100190
Abstract: 快速傅里叶变换(fast Fourier transform,FFT)在科学和工程界中具有着广泛的应用,尤其是在信号处理、图像处理以及求解偏微分方程领域.基于图形处理器(graphic processing unit,GPU)和加速处理器(accelerated processing unit,APU)的异构平台,提出了自适应性能优化的大规模并行FFT(massively parallel FFT,MPFFT)框架.MPFFT框架采用了安装时和运行时2层自适应策略.安装时借助代码产生器可以生成被GPU程序内核(kernel)调用的任意长度的代码模板库(codelet);运行时根据自动调优技术使代码产生器生成高度优化的GPU计算代码.实验结果表明:MPFFT在APU平台上,一维、二维以及三维FFT相对于AMD clAmdFft 1.6取得的平均加速比分别为3.45,15.20以及4.47,在AMD HD7970 GPU上平均加速比分别为1.75,3.01和1.69.在NVIDIA Tesla C2050 GPU上取得的整体性能都达到了CUFFT 4.1的93%,最大加速比能够达到1.28.
English Abstract: The fast Fourier transform (FFT) is an important computational kernel in scientific and engineering computation which has broad applicability, especially in the field of signal processing, image processing and solving partial differential equation. In this paper, we propose an automatic performance tuning framework, called MPFFT (massively parallel FFT), which is well-suited to heterogeneous platforms such as GPU (graphic processing unit) and APU (accelerated processing unit). We employ two-stage adaptation methodology in two levels, namely installation time and runtime. At installation time, there is a code generator that could automatically generate FFT codelet for arbitrary size called by GPU kernel. The code generator could also generate high optimized code for GPU kernel according to auto-tuning techniques at runtime. Experimental results demonstrate that MPFFT substantially outperforms the clAmdFft library both on AMD GPU and APU. For 1D, 2D and 3D FFT, the average speedup of MPFFT compared with clAmdFft 1.6 achieves up to 3.45, 15.20, 4.47 on AMD APU A-360 and 1.75, 3.01, 1.69 on AMD HD7970. It also achieves comparable performance as the CUFFT library on NVIDIA GPU, and the overall performance is within 93% of CUFFT 4.1 on Tesla C2050, and the maximum speedup is 1.28.
Language: 中文
Citation statistics:
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/16763
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
李焱,张云泉. 异构平台上性能自适应FFT框架[J]. 计算机研究与发展,2014-01-01,51(3):637-649.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[李焱]'s Articles
[张云泉]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[李焱]‘s Articles
[张云泉]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace