中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件工程技术研究开发中心  > 学位论文
Subject: 计算机软件::软件工程
Title:
面向Hadoop平台的数据密集型工作流系统的设计与实现
Author: 李奇原
Issued Date: 2012-05-30
Supervisor: 许舒人
Major: 计算机软件与理论
Degree Grantor: 中国科学院研究生院
Place of Degree Grantor: 北京
Degree Level: 硕士
Keyword: 数据密集型应用 ; Hadoop工作流 ; BPEL
Abstract:

 

传统的工作流系统无法满足企业构建数据密集型应用的需求,需要借助于Hadoop平台处理大数据的能力。现有的Hadoop工作流系统采用自定义的描述语言构建Hadoop工作流,无法与企业已有的工作流系统通信,导致企业难以使用已有系统服务与Hadoop平台共同构建处理数据密集型应用的工作流。使用BPEL语言来构建Hadoop工作流既可以借助于传统工作流语言BPEL丰富表现能力、可以作为单个Web服务集成、支持长时间有状态的交互等优点,又可以使用Hadoop平台处理数据密集型应用的能力,是一个解决现有问题的有效手段.
    本文首先分析了使用Hadoop工作流处理数据密集型应用的意义和当前Hadoop工作流系统存在的一些不足,包括:无法与企业已有的工作流系统交互、表达能力弱、缺乏工作流层次的调度和监控等,然后提出了能够有效解决这些问题的面向Hadoop平台的数据密集型工作流系统。在此基础上,围绕着基于规则的模型转换方法、基于Hadoop工作流的公平调度方法和工作流运行时监控技术三个方面展开研究。在基于规则的模型转换方法方面,论文对Hadoop工作流模型和BPEL模型进行了定义,确定了模型转换规则,设计基于规则的模型转换框架,高效的完成了从Hadoop工作流模型到BPEL模型的转换。在基于Hadoop工作流的公平调度方法方面,论文提出了一种基于Hadoop工作流的公平调度方法FlowS。FlowS采用工作流池来组织工作流和分配资源,保证了工作流的隔离性。同时,该方法采用了工作流池动态构建算法,将资源公平的分配到各个工作流中去。在工作流运行时监控技术方面,论文采用持久化工作流模型和异步更新的方法,来降低视图展示的开销。同时,论文提出对每个活动的工作流建立监控实例来处理监控请求和失效发现,以此保证工作流正确执行,进一步降低监视开销。 
    最后,论文应用以上研究成果,设计和实现了面向Hadoop平台的数据密集型工作流系统。

 

English Abstract:

Enterprises have to build data-intensive applications on Hadoop platform for the limited data-process capability of traditional workflow system.Hadoop workflow systems build Hadoop workflows by user-defined language which makes it difficult to communicate and integrate with existing workflow systems. Building Hadoop workflows by BPEL not only can utilize the advantage of traditional description language BPEL, but also can utilize Hadoop platform’s capacity of processing data-intensive applications, so it’s an effective method to solve existing problems.

This thesis analyzes the benefit of dealing with data-intensive applications using Hadoop workflows and some drawbacks of Hadoop workflow system which include inability of interacting with enterprise workflow systems, weak ability of description, lack of workflow-level scheduling and monitoring and so on. In order to solve those problems, this thesis proposes a Hadoop oriented data-intensive workflow system. This thesis focuses on rule-based model transformation, fair scheduling method and run-time monitoring for Hadoop workflow. In rule-based model transformation, this thesis designs mapping rules and the efficient framework of model transformation. In the fair scheduling method for Hadoop workflow, this thesis proposes fair scheduling method-FlowS. This method can not only provide the isolation of Hadoop workflow through workflow pools, but also assure the fairness of resource allocation through dynamic construction algorithm. In run-time monitoring, this thesis persistent Hadoop workflow model and updates view asynchronously to reduce the workload of presentation layer. In the meantimea approach of creating a monitor instance for every active Hadoop workflow, thread pool is proposed, which identify failures and reduce the overhead of monitoring.

In the end, the thesis discusses the design and implementation of Hadoop oriented data-intensive system. The research results above are applied.

Language: 英语
Content Type: 学位论文
URI: http://ir.iscas.ac.cn/handle/311060/14496
Appears in Collections:软件工程技术研究开发中心 _学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
李奇原大论文-打印版2.pdf(2225KB)----限制开放 联系获取全文

Recommended Citation:
李奇原. 面向Hadoop平台的数据密集型工作流系统的设计与实现[D]. 北京. 中国科学院研究生院. 2012-05-30.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[李奇原]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[李奇原]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2020  中国科学院软件研究所 - Feedback
Powered by CSpace