English Abstract: | Enterprises have to build data-intensive applications on Hadoop platform for the limited data-process capability of traditional workflow system.Hadoop workflow systems build Hadoop workflows by user-defined language which makes it difficult to communicate and integrate with existing workflow systems. Building Hadoop workflows by BPEL not only can utilize the advantage of traditional description language BPEL, but also can utilize Hadoop platform’s capacity of processing data-intensive applications, so it’s an effective method to solve existing problems. This thesis analyzes the benefit of dealing with data-intensive applications using Hadoop workflows and some drawbacks of Hadoop workflow system which include inability of interacting with enterprise workflow systems, weak ability of description, lack of workflow-level scheduling and monitoring and so on. In order to solve those problems, this thesis proposes a Hadoop oriented data-intensive workflow system. This thesis focuses on rule-based model transformation, fair scheduling method and run-time monitoring for Hadoop workflow. In rule-based model transformation, this thesis designs mapping rules and the efficient framework of model transformation. In the fair scheduling method for Hadoop workflow, this thesis proposes fair scheduling method-FlowS. This method can not only provide the isolation of Hadoop workflow through workflow pools, but also assure the fairness of resource allocation through dynamic construction algorithm. In run-time monitoring, this thesis persistent Hadoop workflow model and updates view asynchronously to reduce the workload of presentation layer. In the meantime,a approach of creating a monitor instance for every active Hadoop workflow, thread pool is proposed, which identify failures and reduce the overhead of monitoring. In the end, the thesis discusses the design and implementation of Hadoop oriented data-intensive system. The research results above are applied. |