ISCAS OpenIR  > 软件工程技术研究开发中心
面向Hadoop平台的数据密集型工作流系统的设计与实现
李奇原
Major计算机软件与理论
Supervisor许舒人
2012-05-30
Degree Grantor中国科学院研究生院
Degree Level硕士
Place of Degree Grantor北京
Keyword数据密集型应用 Hadoop工作流 Bpel
Abstract

 

传统的工作流系统无法满足企业构建数据密集型应用的需求,需要借助于Hadoop平台处理大数据的能力。现有的Hadoop工作流系统采用自定义的描述语言构建Hadoop工作流,无法与企业已有的工作流系统通信,导致企业难以使用已有系统服务与Hadoop平台共同构建处理数据密集型应用的工作流。使用BPEL语言来构建Hadoop工作流既可以借助于传统工作流语言BPEL丰富表现能力、可以作为单个Web服务集成、支持长时间有状态的交互等优点,又可以使用Hadoop平台处理数据密集型应用的能力,是一个解决现有问题的有效手段.
    本文首先分析了使用Hadoop工作流处理数据密集型应用的意义和当前Hadoop工作流系统存在的一些不足,包括:无法与企业已有的工作流系统交互、表达能力弱、缺乏工作流层次的调度和监控等,然后提出了能够有效解决这些问题的面向Hadoop平台的数据密集型工作流系统。在此基础上,围绕着基于规则的模型转换方法、基于Hadoop工作流的公平调度方法和工作流运行时监控技术三个方面展开研究。在基于规则的模型转换方法方面,论文对Hadoop工作流模型和BPEL模型进行了定义,确定了模型转换规则,设计基于规则的模型转换框架,高效的完成了从Hadoop工作流模型到BPEL模型的转换。在基于Hadoop工作流的公平调度方法方面,论文提出了一种基于Hadoop工作流的公平调度方法FlowS。FlowS采用工作流池来组织工作流和分配资源,保证了工作流的隔离性。同时,该方法采用了工作流池动态构建算法,将资源公平的分配到各个工作流中去。在工作流运行时监控技术方面,论文采用持久化工作流模型和异步更新的方法,来降低视图展示的开销。同时,论文提出对每个活动的工作流建立监控实例来处理监控请求和失效发现,以此保证工作流正确执行,进一步降低监视开销。 
    最后,论文应用以上研究成果,设计和实现了面向Hadoop平台的数据密集型工作流系统。

 

;

Enterprises have to build data-intensive applications on Hadoop platform for the limited data-process capability of traditional workflow system.Hadoop workflow systems build Hadoop workflows by user-defined language which makes it difficult to communicate and integrate with existing workflow systems. Building Hadoop workflows by BPEL not only can utilize the advantage of traditional description language BPEL, but also can utilize Hadoop platform’s capacity of processing data-intensive applications, so it’s an effective method to solve existing problems.

This thesis analyzes the benefit of dealing with data-intensive applications using Hadoop workflows and some drawbacks of Hadoop workflow system which include inability of interacting with enterprise workflow systems, weak ability of description, lack of workflow-level scheduling and monitoring and so on. In order to solve those problems, this thesis proposes a Hadoop oriented data-intensive workflow system. This thesis focuses on rule-based model transformation, fair scheduling method and run-time monitoring for Hadoop workflow. In rule-based model transformation, this thesis designs mapping rules and the efficient framework of model transformation. In the fair scheduling method for Hadoop workflow, this thesis proposes fair scheduling method-FlowS. This method can not only provide the isolation of Hadoop workflow through workflow pools, but also assure the fairness of resource allocation through dynamic construction algorithm. In run-time monitoring, this thesis persistent Hadoop workflow model and updates view asynchronously to reduce the workload of presentation layer. In the meantimea approach of creating a monitor instance for every active Hadoop workflow, thread pool is proposed, which identify failures and reduce the overhead of monitoring.

In the end, the thesis discusses the design and implementation of Hadoop oriented data-intensive system. The research results above are applied.

Subject软件工程
Language英语
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/14496
Collection软件工程技术研究开发中心
Recommended Citation
GB/T 7714
李奇原. 面向Hadoop平台的数据密集型工作流系统的设计与实现[D]. 北京. 中国科学院研究生院,2012.
Files in This Item:
File Name/Size DocType Version Access License
李奇原大论文-打印版2.pdf(2225KB) 开放获取LicenseApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[李奇原]'s Articles
Baidu academic
Similar articles in Baidu academic
[李奇原]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[李奇原]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.