ISCAS OpenIR  > 中科院软件所  > 中科院软件所
共享存储并行编程应用研究及TLB对基于层次存储计算模型的影响初探
Alternative TitleParallel programming application research on shared memory system and the primary exploration of the TLB effect to the parallel computing model with hierarchical memory
宋刚
2007-06-06
Degree Grantor中国科学院软件研究所
Degree Level博士
Place of Degree Grantor软件研究所
Keyword并行编程 多线程 Openmp Fepg Gzip Papi Tlb 并行计算模型
English Abstract高性能计算技术近年来在国内外都得到了巨大发展,虽然在硬件方面突破较大(如高性能计算机的研制),但在并行应用软件方面的发展则比较缓慢,最终导致并行应用软件的开发严重滞后于并行硬件平台的发展。只有并行应用软件的发展跟上硬件平台的发展,才能充分利用这些高性能计算资源。尤其是多核处理器的推出,使用户面临无处不在的并行。如何将已有的串行算法并行化并在并行计算机上优化实现,成为高性能计算发展面临的迫切任务。因此本文的研究重点主要针对应用软件的核心算法的并行程序设计与开发,开展了研究开发工作;文中最后还初步探讨了TLB对基于层次存储系统的并行计算模型的分析精度的影响的问题。 本文的并行程序设计与开发部分主要是针对共享存储并行计算系统,研究如何用共享存储的编程标准OpenMP进行并行程序设计,其中包括数值计算和非数值计算的应用实例。数值计算方面,我们为飞箭软件有限公司研发了有限元单元计算子程序的多核版本,论文讨论了我们在共享存储并行编程过程中遇到的一些问题,并给出了我们的解决思路和方案;在一个4路双核的曙光服务器上的实际测试表明,8个线程的加速比超过了5;在非数值计算领域,我们设计开发了用OpenMP实现的并行压缩软件,并具体分析了其中的关键实现技术,在曙光四路双核服务器上的测试,得到了平均9.27倍的加速比。 此外,本文还分析了各种基于层次存储结构的并行计算模型的优缺点。通过调用PAPI软件对三种不同形式的分块矩阵乘进行的性能测试与分析,讨论了TLB对程序性能的影响,进一步验证了将TLB纳入基于层次存储结构并行计算模型的必要性。
AbstractThere has been rapid development of high performance computing technology especially on the hardware (such as the high performance computer), but the development of parallel software is very relative slow. Only when the development of parallel software matches up with the development of parallel hardware, these high performance computing resources can be made better usage. With the emergence of dual-core and multi-core processors in PC market, parallel computing now becomes pervasively. How to parallelize existing sequential programs to run efficiently on multi-core HPC systems becomes an urgent task. So in this thesis, we focus on the design and development of parallel software, and we also discussed the influence of TLB on the parallel computing model with hierarchical memory. In the design and development part of parallel software, we focus on the shared memory machine. We carry out researches on how to use the shared memory parallel programming standard OpenMP. We developped both numerical and non-numerical parallel programs, and performed performance testing on them. For the numerical computing application, we developed a multi-core version of the element computation subroutine of finite element method, analyzed the problems we met and gave our solutions. Its final speedup is beyond 5 on a server with four-way AMD Opteron dual-core processors. For the non-numerical application, we use the OpenMP to parallelize the gzip compression software. Through the performance testing on one server with four-way dual-core AMD processor, we got an average speedup of 9.27. In this thesis, we also analyzed different parallel computing models with hierarchical memory and finally used PAPI to perform performance testing on three different kinds of blocked matrix multiply algorithms, our experimental results further verified the necessarity of adding TLB factors into the parameter space of parallel computing model with hierarchical memory.
Pages71
Language中文
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/5746
Collection中科院软件所_中科院软件所
Recommended Citation
GB/T 7714
宋刚. 共享存储并行编程应用研究及TLB对基于层次存储计算模型的影响初探[D]. 软件研究所. 中国科学院软件研究所,2007.
Files in This Item:
File Name/Size DocType Version Access License
10001_20042801502905(2505KB) 限制开放--Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[宋刚]'s Articles
Baidu academic
Similar articles in Baidu academic
[宋刚]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[宋刚]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.