ISCAS OpenIR  > 并行软件与计算科学实验室 
面向申威多核处理器的快速傅立叶变换并行算法与自适应调优框架研究
路青霖
Major计算机软件与理论
Supervisor刘芳芳
2022-05-28
Degree Grantor中国科学院研究生院
Degree Level硕士
Place of Degree Grantor北京
Keyword申威多核处理器 Fft 计算图 自适应调优框架 高性能计算
English Abstract

在数值计算、计算机图形、深度学习等领域中,快速傅立叶变换(Fast Fourier Transform,FFT)得到了广泛的应用。作为计算耗时占比较高的部分,采用高性 能的 FFT 数学库软件能够大幅提高其计算性能,缩短复杂应用的计算时间。伴 随着国产处理器的蓬勃发展,相关联的软件生态面临着配套软件少、适配不充分等问题,因此,面向国产处理器研发高性能基础软件对实现软硬件自主可控有着 重要的意义。在申威多核处理器上,现有开源 FFT 数学库软件无法充分发挥硬件性能,存在调优时间长、计算性能不稳定等问题,缺乏一套多层次的、完整的自适应调优框架及相应的 FFT 数学库软件。

本文面向国产申威多核处理器,针对开源数学库 FFTW 存在的问题设计并实现了支持 2 的幂次 FFT 计算的高性能数学库软件;针对国产处理器对称多处 理架构设计实现了负载均衡的并行任务划分算法;同时基于计算图模型实现了自动调优框架,利用硬件参数对 FFT 方案分解问题进行建模,实现计算方案的快速调优;运用自动代码生成、向量化、数据重排等优化技术进行优化。在申威 3231 平台上,本文设计实现的 FFT 数学库软件性能相比开源数学库 FFTW 实现平均 1.94 倍、最高 2.71 倍加速;32 线程下相比开源数学库 FFTW 实现平均 5.49 倍、最高 39.75 倍加速;自适应调优框架性能相比开源数学库 FFTW 实现平均 1030 倍加速。

Abstract

Fast Fourier Transform (FFT) is widely used in numerical computing, computer graphics, deep learning, and other fields. As the relatively high percentage part of time consumption, using a high-performance FFT library can significantly improve the performance and reduce computation time in applications. Along with the rapid development of China’s domestic processors, the associated software ecology faces serious problems: few supporting software and inadequate adaptation. Therefore, developing essential, high-performance software for China’s domestic processors is significant for realizing independent software and hardware control. The existing open-source FFT library cannot fully utilize the hardware performance of the Sunway multi-core processor. There are problems such as long tuning time and unstable calculation performance. In addition, there is a lack of a multi-level and complete adaptive tuning framework and corresponding FFT library.

This paper designs and implements a high-performance library to support powerof-2 FFT computation for Sunway multi-core processors, including an adaptive tuning framework based on the computational graph model, which models the FFT decomposition problem using hardware parameters to achieve fast tuning, and a load-balanced parallel task partitioning algorithm for processors with symmetric multi-processing architecture. Meanwhile, we use automatic code generation, vectorization, data reordering, and other optimization techniques to optimize. On the Sunway 3231 platform, the performance of the FFT library designed and implemented in this paper achieves an average of 1.94 times and a maximum of 2.71 times speedup compared to the open-source library FFTW; an average of 5.49 times and a maximum of 39.75 times speedup compared to FFTW under 32 threads, and an average of 1030 times speedup compared to FFTW for the adaptive tuning framework.

Subject并行处理 ; 软件理论
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/19493
Collection并行软件与计算科学实验室 
Affiliation中国科学院软件研究所
Recommended Citation
GB/T 7714
路青霖. 面向申威多核处理器的快速傅立叶变换并行算法与自适应调优框架研究[D]. 北京. 中国科学院研究生院,2022.
Files in This Item:
File Name/Size DocType Version Access License
路青霖-面向申威多核处理器的快速傅立叶变(1561KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[路青霖]'s Articles
Baidu academic
Similar articles in Baidu academic
[路青霖]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[路青霖]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.