中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
大规模集群系统的性能评价与通信优化研究
作者: 唐渊
答辩日期: 2004
专业: 计算机软件与理论
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 大规模集群系统 ; 性能评价 ; 通信优化 ; 通信行为方式 ; 热点测试 ; 用户级通信 ; LINPACK标准 ; FFT标准
其他题名: Research on the Performance Evaluation and Communication Optimization of Large Scale Cluster System
摘要: 传统上说,使用互连连接完整计算机称为“集群(clusters)”,用于分担程序的工作负荷,改善系统可用性。从历史上看,许多大规模科学计算的应用软件最初都是在早期的共享存储体系结构的超级计算系统上研制并开发的,要移植到分布式体系结构的集群系统上,除了从通信算法,减少全局通信,尽量使用局部通信;改进迭代算法等方向进行研究以外,很重要的一点是还必须从现代集群系统所用通信平台的底层,包括硬件和系统软件两方面来着手进行改进和有针对性的优化。这包括对原集群系统所用百兆以太网上的MPI-TCP/IP等通信协议进行相应简化,开发适宜的通信优化技术等。就大规模科学计算在现代高性能集群系统上的实际应用来说,比如孙家起老师主持863项目<<大规模整体油田的精细油藏数值模拟>>( 863-306-2D01-03-1),曹建文主持863项目<<分布式大规模数值并行理论与算法研究>>(863-306-2D01-03-2)以及孙老师主持的973项目<<油藏模拟与波动问题及其反问题计算>>( G1999032803)等,一致表现出来的是通信时间所占比重过大(如2000年并行中心的油藏模拟软件在当时的集群系统上通信时间占近60%强),而且随着使用节点处理器数目的增多,整个应用程序的并行性和可扩展性下降很快,理论峰值的利用率较低。在最近的ACM/IEEE联合举办的超级计算机界最高级别的学术会议SuPer Computing 2002/2003上,许多文献也都不约而同的指出,集合通信,特别是其开销随通信进程数增多而急剧增大的问题,是提高大规模科学计算的并行效率,使其能在几十、几百甚至上千个处理器上具有良好扩展性的一个非常关键而急需解决的问题。因此,如何从理论到实践上解决好高性能集群系统的使用及针刘一些关键性的实际应用做好通信上的优化就成了本文中重点想要解决的一个问题。为了能够对现代高性能集群系统上实际运行的一些大规模科学计算应用进行有针对性的优化,找准入手点,本文从系统软件到上层的具体应用做了大量的测试与分析。其中包括:对于现代高性能集群系统几种主要播建技术(包括不同的计算节点,不同的底层通信介质等)的优缺点进行了量化的研究;针对一些大规模科学计算的测试基准程序,如NAS Parallel Bench-marks等,在国家973项目集群系统一LSSCZ上实测了它们的性能,得到了它们的变化曲线。将具体的数据与国际上已公布的一些数据和结论等相比较,总结了一些规律。对如何利用好我国现有的高性能集群系统提出了一些合理化建议;在对国际上一个重要的用户级通信协议GAMMA进行大量测试与分析的基础上,总结了用户级通信协议的优缺点,给出了一个适用于小规模集群系统的用户级通信协议框架一,TMachine,当我们将2002年底刚发布的NAS Porollel Benchmarks版本2.4的Closs=D这类大规模的并行应用在同时配备了MyrinetZ。。0和百兆快速以太网的国家973项目集群系统一LSSC2上实测的时候却发现了一个很奇怪的现象:相对低延迟、高带宽的Myrinet 2000对NPB2.4中LU程序的实测性能要远远低于百兆快速以太网。而且,无论是使用多大的问题规模(Class=A/B/C/D),使用多少个计算节点(NPROCS=2,4,…,256),使用何种编译优化选项(-O/-O2/-O3),使用何种Myrinet 2000上的消息接收方式(polling/blocking/hybrid),都是如此。在历史上,主要从以下两个角度看待集群网络的性能问题:主要考察网络的短消息延迟和长消息传输时的最大带宽。并以这两个指标作为评价网络性能的最重要指标。从通信模型的角度来考虑高性能网络的性能问题。如PRAM,BSP,LogP,LogGP等。但这两个观点都无法对上述奇特的LU现象作出合理的解释。为此,经过大量的实验研究,我们提出了“热点测试”的观点,以期对此类现象的研究给出一定的思路。进一步的,结合我们对NPB2.4系列程序的测试与建模的具体经验,给出了针对实际应用的大规模并行软件通信性能建模的一个方法体系。在我国,对高性能计算系统的性能评价历经了理论峰值评价,LIN-PACK峰值评价,到如今的应用评价三个阶段。而无论是国际TOP500排名,还是国内的TOP50排名对高性能计算系统的性能衡量标准-LIN-PACK测试峰值,对于现代高性能计算系统的衡量并不够全面。这主要是由于LINPACK测试所关注的问题领域,主要还是解稠密线性方程组,有一定的片面性;而且LINPACK测试中也没有一个具体而量化的对通信性能的衡量指标。所以我们迫切需要找到另一类能反映相当一类实际应用的测试基准程序,以对由LINPACK测试为主导的现代高性能计算系统的性能评估作出一定的补充。本文的主要创新和贡献是:以应用需求为驱动,推动对高性能计算平台通信硬件、系统软件的研究。从大规模科学计算实际应用的角度切入进行具体的研究工作。针对LU现象中所使用的一种主要通信优化方法“通信与通信的重叠”,做了深入的研究。提出“热点测试”的观点:测试集群网络通信协议(包括硬件和软件)对各种可能的上层程序行为方式的支持程度。在P2P+LogGP测试的基础上必须增加“热点测试”,才能比较完整地反映一个特定系统对应用的影响。鉴于当今超级计算系统TOP500排名的基准一LINPACK测试的一些不足,提出大规模科学计算通信性能钡(试与评估的一些原则与方法。特别的,提出针对高性能计算中的集合通信能力测试的FFT标准,结合一个具体实例“一类非规则区域的快速广义离散傅立叶变换算法一HFFT算法”及其所属的FFTH软件包,给出了具体的目的、原则、方法及实测示例。结合我们对NPB2.4系列程序的测试与建模的具体经验,参考国际上相关文献工作经验,着重于程序的计算、通信行为及交互,给出了针对实际应用的大规模并行软件通信性能建模的一个方法体系。
英文摘要: Traditionally speaking, Cluster is the interconnection of various number of computers, for the purpose of sharing the working load and improving the applicability of the whole system. Cluster is a new emergent high performance computing solution. At the beginning, lots of large scale scientific computing applications were developed on top of shared-memory architecture super computing system which indicating numerous efforts would be done while the applications be ported to distributed-memory architecture systems. The porting efforts include communication algorithm re-design, communication behavior re-scheduling, communication protocol re-polished and so on. From the most recent applications of large scale scientific computing on cluster, we see that the communication time occupies more and more percents in the total wall clock time of an application. Also the percentage raises as the increasing of the number of nodes employed, which indicates the drop of parallelism and scalability of real applications, which also implies the lower efficiency of cluster system. In ACM/ IEEE Supercomputing 2002/ 2003, several paper indicated that collective communication should receive more attention while more nodes are employed for co-computing. So how to efficiently use the high performance computing cluster systems and optimize the communication for some key applications are the focus of this thesis. In order to adjust the perspective and find a proper cutting point, we made a lot of tests as well as the analysis from the bottom communication protocol to up real applications, including We quantify the advantages versus disadvantages of several main kinds of cluster configurations in chapter 2. We gave out a detailed testing report on the state of art of cluster of P.R.China in chapter 3, as well as some important conclusions and suggestions. After thorough analysis and testing of a state of art of User Level Communica tion protocol - GAMMA, we got the key issues of User Level Communication suitable for Cluster Environment. Also, we designed and implemented a prototype system named TMachine based on these recognitions. While we run the LU of NPB 2.4 (NAS Parallel Benchmarks) on LSSC2, which equiped with both Myrinet 2000 and 100M fast ethernet. A very abnormal phenomenon disclosed: LU run much faster on ethernet than on Myrinet, although the latency and bandwidth of Myrinet are much better. No matter how many nodes employed, no matter how large the problem class (A/ B/ C/ D) , no matter which compiling flag used (-0/ -02/ -03), no matter which receiving mode of Myrinet used (polling/ blocking/ hybrid), the fact always hold. There are 2 trends in history to evaluate the performance of cLAN (cluster area networks): One school of thought is primarily interested in round-trip latency and large message bandwidth as indicators of network performance. Another school of thought adopts a more detailed model of the network per formance, such as LogP, LogGP. While these 2 points of view can not accurately explain the abnormal LU phenomenon. After thorough research, we put forward the opinion of hot spot test. One step further, based on our real testing and modelling experience of NPB 2.4 LU programs, we gave out a framework for the convenience of communication modelling of large scale parallel applications. In our state, the evaluation of high performance computing system go through 3 stages, that is, theoretic peak performance, LINPACK number, and real application value. Due to the inefficiency of LINPACK evaluation of whole computing system, we are trying to find another representative applications to be a supplement of in this thesis. The main innovative points and contributions of this thesis are: Based on the request of real application, driving the research on the hardware and software of high performance computing system. In chapter 4, we found some real applications run much slower on Myrinet 2000 than on 100M fast ethernet, which seems eccentric. We tracked down this very abnormal phenomenon and dug out the back reason. Further tests on Gigabit Ethernet and Infiniband said these low latency, high bandwidth cluster area network might have similar problems. LINPACK is useful, but no single test can accurately reflect the overall perfor mance of HPC (High Performance Computing). We are trying to propose some principle and method of communication testing of HPC. Especially the FFT standard for the testing and evaluating of collective communication. Also, the steps and results of a real FFT standard testing example are illustrated. Based on our real testing and modelling experience of NPB 2.4 LU programs and some related working experience of references, focusing on the computation and communication behavior of real applications, we gave out a framework for the convenience of communication modelling of large scale parallel applications.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/6056
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW013934.pdf(1985KB)----限制开放-- 联系获取全文

Recommended Citation:
唐渊. 大规模集群系统的性能评价与通信优化研究[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2004-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[唐渊]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[唐渊]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace