中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
共享存储并行编程应用研究及TLB对基于层次存储计算模型的影响初探
作者: 宋刚
答辩日期: 2007-06-06
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 并行编程 ; 多线程 ; OpenMP ; FEPG ; gzip ; PAPI ; TLB ; 并行计算模型
其他题名: Parallel programming application research on shared memory system and the primary exploration of the TLB effect to the parallel computing model with hierarchical memory
摘要: 高性能计算技术近年来在国内外都得到了巨大发展,虽然在硬件方面突破较大(如高性能计算机的研制),但在并行应用软件方面的发展则比较缓慢,最终导致并行应用软件的开发严重滞后于并行硬件平台的发展。只有并行应用软件的发展跟上硬件平台的发展,才能充分利用这些高性能计算资源。尤其是多核处理器的推出,使用户面临无处不在的并行。如何将已有的串行算法并行化并在并行计算机上优化实现,成为高性能计算发展面临的迫切任务。因此本文的研究重点主要针对应用软件的核心算法的并行程序设计与开发,开展了研究开发工作;文中最后还初步探讨了TLB对基于层次存储系统的并行计算模型的分析精度的影响的问题。 本文的并行程序设计与开发部分主要是针对共享存储并行计算系统,研究如何用共享存储的编程标准OpenMP进行并行程序设计,其中包括数值计算和非数值计算的应用实例。数值计算方面,我们为飞箭软件有限公司研发了有限元单元计算子程序的多核版本,论文讨论了我们在共享存储并行编程过程中遇到的一些问题,并给出了我们的解决思路和方案;在一个4路双核的曙光服务器上的实际测试表明,8个线程的加速比超过了5;在非数值计算领域,我们设计开发了用OpenMP实现的并行压缩软件,并具体分析了其中的关键实现技术,在曙光四路双核服务器上的测试,得到了平均9.27倍的加速比。 此外,本文还分析了各种基于层次存储结构的并行计算模型的优缺点。通过调用PAPI软件对三种不同形式的分块矩阵乘进行的性能测试与分析,讨论了TLB对程序性能的影响,进一步验证了将TLB纳入基于层次存储结构并行计算模型的必要性。
英文摘要: There has been rapid development of high performance computing technology especially on the hardware (such as the high performance computer), but the development of parallel software is very relative slow. Only when the development of parallel software matches up with the development of parallel hardware, these high performance computing resources can be made better usage. With the emergence of dual-core and multi-core processors in PC market, parallel computing now becomes pervasively. How to parallelize existing sequential programs to run efficiently on multi-core HPC systems becomes an urgent task. So in this thesis, we focus on the design and development of parallel software, and we also discussed the influence of TLB on the parallel computing model with hierarchical memory. In the design and development part of parallel software, we focus on the shared memory machine. We carry out researches on how to use the shared memory parallel programming standard OpenMP. We developped both numerical and non-numerical parallel programs, and performed performance testing on them. For the numerical computing application, we developed a multi-core version of the element computation subroutine of finite element method, analyzed the problems we met and gave our solutions. Its final speedup is beyond 5 on a server with four-way AMD Opteron dual-core processors. For the non-numerical application, we use the OpenMP to parallelize the gzip compression software. Through the performance testing on one server with four-way dual-core AMD processor, we got an average speedup of 9.27. In this thesis, we also analyzed different parallel computing models with hierarchical memory and finally used PAPI to perform performance testing on three different kinds of blocked matrix multiply algorithms, our experimental results further verified the necessarity of adding TLB factors into the parameter space of parallel computing model with hierarchical memory.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/5746
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200428015029059宋刚_paper.doc(2505KB)----限制开放-- 联系获取全文

Recommended Citation:
宋刚. 共享存储并行编程应用研究及TLB对基于层次存储计算模型的影响初探[D]. 软件研究所. 中国科学院软件研究所. 2007-06-06.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[宋刚]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[宋刚]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace