中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 并行计算实验室  > 学位论文
Subject: 计算机科学技术::计算机系统结构::并行处理 ; 计算机科学技术::计算机软件::软件理论
Title:
面向申威多核处理器的快速傅立叶变换并行算法与自适应调优框架研究
Author: 路青霖
Issued Date: 2022-05-28
Supervisor: 刘芳芳
Major: 计算机软件与理论
Degree Grantor: 中国科学院研究生院
Place of Degree Grantor: 北京
Degree Level: 硕士
Keyword: 申威多核处理器 ; FFT ; 计算图 ; 自适应调优框架 ; 高性能计算
Abstract:

在数值计算、计算机图形、深度学习等领域中,快速傅立叶变换(Fast Fourier Transform,FFT)得到了广泛的应用。作为计算耗时占比较高的部分,采用高性 能的 FFT 数学库软件能够大幅提高其计算性能,缩短复杂应用的计算时间。伴 随着国产处理器的蓬勃发展,相关联的软件生态面临着配套软件少、适配不充分等问题,因此,面向国产处理器研发高性能基础软件对实现软硬件自主可控有着 重要的意义。在申威多核处理器上,现有开源 FFT 数学库软件无法充分发挥硬件性能,存在调优时间长、计算性能不稳定等问题,缺乏一套多层次的、完整的自适应调优框架及相应的 FFT 数学库软件。

本文面向国产申威多核处理器,针对开源数学库 FFTW 存在的问题设计并实现了支持 2 的幂次 FFT 计算的高性能数学库软件;针对国产处理器对称多处 理架构设计实现了负载均衡的并行任务划分算法;同时基于计算图模型实现了自动调优框架,利用硬件参数对 FFT 方案分解问题进行建模,实现计算方案的快速调优;运用自动代码生成、向量化、数据重排等优化技术进行优化。在申威 3231 平台上,本文设计实现的 FFT 数学库软件性能相比开源数学库 FFTW 实现平均 1.94 倍、最高 2.71 倍加速;32 线程下相比开源数学库 FFTW 实现平均 5.49 倍、最高 39.75 倍加速;自适应调优框架性能相比开源数学库 FFTW 实现平均 1030 倍加速。

English Abstract:

Fast Fourier Transform (FFT) is widely used in numerical computing, computer graphics, deep learning, and other fields. As the relatively high percentage part of time consumption, using a high-performance FFT library can significantly improve the performance and reduce computation time in applications. Along with the rapid development of China’s domestic processors, the associated software ecology faces serious problems: few supporting software and inadequate adaptation. Therefore, developing essential, high-performance software for China’s domestic processors is significant for realizing independent software and hardware control. The existing open-source FFT library cannot fully utilize the hardware performance of the Sunway multi-core processor. There are problems such as long tuning time and unstable calculation performance. In addition, there is a lack of a multi-level and complete adaptive tuning framework and corresponding FFT library.

This paper designs and implements a high-performance library to support powerof-2 FFT computation for Sunway multi-core processors, including an adaptive tuning framework based on the computational graph model, which models the FFT decomposition problem using hardware parameters to achieve fast tuning, and a load-balanced parallel task partitioning algorithm for processors with symmetric multi-processing architecture. Meanwhile, we use automatic code generation, vectorization, data reordering, and other optimization techniques to optimize. On the Sunway 3231 platform, the performance of the FFT library designed and implemented in this paper achieves an average of 1.94 times and a maximum of 2.71 times speedup compared to the open-source library FFTW; an average of 5.49 times and a maximum of 39.75 times speedup compared to FFTW under 32 threads, and an average of 1030 times speedup compared to FFTW for the adaptive tuning framework.

Content Type: 学位论文
URI: http://ir.iscas.ac.cn/handle/311060/19493
Appears in Collections:并行计算实验室 _学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
路青霖-面向申威多核处理器的快速傅立叶变换并行算法与自适应调优框架研究.pdf(1561KB)学位论文--限制开放 联系获取全文

description.institution: 中国科学院软件研究所

Recommended Citation:
路青霖. 面向申威多核处理器的快速傅立叶变换并行算法与自适应调优框架研究[D]. 北京. 中国科学院研究生院. 2022-05-28.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[路青霖]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[路青霖]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2022  中国科学院软件研究所 - Feedback
Powered by CSpace