中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 综合信息系统技术国家级重点实验室  > 学位论文
题名:
MPICH同步通信程序死锁检测研究
作者: 廖名学
答辩日期: 2008-06-04
导师: 范植华
专业: 计算机应用技术
授予单位: 中国科学院研究生院
授予地点: 中国科学院软件研究所
学位: 博士
关键词: MPI ; MPICH ; 死锁 ; 同步通信 ; 模型
其他题名: Research on Deadlock Detection in MPICH Synchronization Communication Programs
分类号: 暂无
索取号: 暂无
部门归属: 综合信息系统技术国家级重点实验室
摘要: MPI是分布式内存并行处理计算机上开发基于消息传递应用系统的事实标准,主要用于并行计算机和集群的高性能运算,MPICH是其重要实现。MPI程序可能发生死锁,而且调试困难,国际上主要采用运行时动态调试方法。但是,这些方法的致命缺陷是发现死锁后,死锁造成的损失已无可挽回;况且,它们无法用于全节点空间上。为解决这些问题,针对MPICH点对点同步通信程序,本文提出了一种静态方法。 根据静态方法程序建模的需要,开发了一种基于JavaCC技术的编译器前端工具,该工具对MPICH程序进行编译产生程序语法树。 提出一种MPICH同步通信程序的建模方法,并设计出一种模型构造器遍历语法树来构造模型。根据程序性质,模型构造器可构造出三种模型:无参数模型,一次参数化模型和二次参数化模型。无参数模型不含任何参数;一次参数化模型含有一个并行节点ID的参数;二次参数化模型含有并行节点ID以及并行节点数量两个参数。构造出一种优化算法将全节点空间上的一次参数化模型转化为无参数模型之后进行死锁判定;对二次参数化模型,采用穷举方法转化为无参数模型后进行死锁判定。 针对非对称条件程序死锁性质的不确定性,提出对称条件假设。根据该假设,无参数模型被转化为循环嵌套模型之后再进行死锁判定。 为提高循环嵌套模型死锁判定效率,提出比例方程组概念。设计出建立并求解比例方程组的线性时空复杂度算法。比例方程组无解有两种情况:比例方程组包含第一类或者第二类比例冲突。提出比例冲突相关的方法,在常数时间内通过分析循环模型中产生冲突的几个循环的循环次数是否满足死锁条件来判断死锁。在比例方程组有解的情况下,给出了一种算法截取该模型的常数个循环片断进行死锁判定。为对循环片断进行死锁判定,开发了一种线性时空复杂度算法将片断顺序化后进行死锁判定。 为进一步提高算法效率,对Java哈希表类库进行了优化,使该库能够完全避免由于存取比例数据而产生的哈希碰撞。 理论证明,实验表明,该方法能够在运行前,最坏情况下以线性时空复杂度,判定点对点MPICH同步通信程序在全节点空间的死锁性质。
英文摘要: Mainly used in high performance computing on parallel machines and clusters, MPI is a facto standard for developing message-passing based applications in parallel-processing computers with distributed memory. MPICH is an important MPI implementation. MPI programs may have deadlocks but debugging is very difficult. Run-time dynamic methods are mostly adopted worldwide. However, they have a deadly deficiency that loss by deadlock is unavoidable when deadlocks are found. Moreover, they can not be used in whole-node space. Aiming at deadlock detection in MPICH point-to-point synchronous communication programs, a static method is originally presented to solve these problems in this paper. To meet requirements of program modeling for static methods, a JavaCC-based compiler front end is developed to compile MPICH programs and to build their syntax trees. A method for modeling MPICH synchronization communication programs is put forward and a model builder is designed to traverse the syntax tree to produce models. According to program property, three kinds of models may be built: 0-parameter model, one-parameter model and two-parameter model. 0-parameter model has no parameter. One-parameter model has a parameter called parallel node ID. Besides ID two-parameter model has another parameter called parallel node amount. An optimized algorithm is constructed to translate one-parameter models into 0-parameter ones for deadlock detection. For two-parameter models an exhaustive process is employed to executing such translation. Conditional symmetry hypothesis is proposed to avoid uncertainty in deadlock detection in programs with asymmetric conditions. Based on this hypothesis, 0-parameter model will be turned into loop nested model before deadlock detection. In order to improve efficiency in deadlock detection in loop nested model, ratio equation group (REG) concept is given and an algorithm with linear time-space complexity is devised to build REG from the model and to resolve it. There are two cases where REG has no solution: the first-class conflict and second-class conflict hidden in REG. A method on ratio conflicts is brought forward to decide a deadlock in constant time by checking if loop times of conflicting loops of the model meet certain conditions. If there is a solution to REG, an algorithm is advanced to slice the model into several loop sections for deadlock detection. An algorithm with linear time-space complexity is created to transform the sections into a sequential form and then to detect deadlocks in this form. To improve efficiency further in these algorithms for deadlock detection, the Java hash table class library is optimized so that hash conflicts by proportional data access can be completely cleared. Both theory reasoning and experiments show that, with a linear time-space complexity under worst cases, the method can decide what deadlock properties a point-to-point MPICH synchronous communication program has throughout whole-node space before the program goes into running.
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/6302
Appears in Collections:综合信息系统技术国家级重点实验室 _学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200518015029029廖名学_paper.pdf(2244KB)----限制开放-- 联系获取全文

Recommended Citation:
廖名学. MPICH同步通信程序死锁检测研究[D]. 中国科学院软件研究所. 中国科学院研究生院. 2008-06-04.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[廖名学]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[廖名学]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace