ISCAS OpenIR  > 基础软件国家工程研究中心
大数据背景下集群取样调度关键技术研究
Alternative TitleResearch on Key Technologies of Sample Based Scheduling under Big Data Context
郝春亮
Major计算机软件与理论
Supervisor李明树
2017-05-22
Degree Grantor中国科学院大学
Degree Level博士
Place of Degree Grantor北京
Keyword集群 调度 取样方法 调度精确度 大数据 交互作业 随机游走
Abstract

近年来,随着大数据环境的快速发展,数据处理集群的规模在不断增长。目前由 几千台服务器组成的数据处理集群已非罕见,部分领军企业的集群甚至包含上万台服 务器。在这些大规模集群中,调度方法面临严峻挑战:传统的使用集中决策逻辑的调 度方法因不可并行扩展、高调度延迟等特点,面临成为性能瓶颈的风险;同时由于需 要进行全局同步,集中逻辑的调度方法在目前的集群规模下难以向交互作业提供其所 需的毫秒级延迟。因此研究者开始转向探索其他可能的调度方法,取样调度就是其中 的研究热点之一。

取样调度方法的核心优势在于其全分布式结构、简单的调度过程以及极低的调度 延迟下限,这些特性理论上更适合当前大规模集群环境的作业需求。目前该方法多服 务于以Spark为代表的快速数据处理框架。取样调度包含三个主要部分,即:集群资源 抽象、基本作业调度以及高级调度特性支持。目前作为一个较新的研究方向,集群取 样调度方法在其三个主要部分中都有尚待完善的技术难点,限制了该方法目前在实际 集群中的适用性。

本文对取样方法分别进行深入研究分析,在保留取样方法中高可扩展和低延迟特 性的同时对取样方法进行改进。通过改进,本论文中初步形成了一套低成本、高精确 度的取样调度方法。具体来说,本文的主要内容和贡献如下:

1. 对目前大数据背景下的集群调度方法进行经验分析:归纳目前集群调度的主要 问题、关注重点以及发展方向,为本文的取样调度研究作为方向指引。

2. 在资源抽象部分,对取样调度中因资源表示缺陷引起的调度精确度问题进行 研究:本文认为该问题的原因在于,所有针对资源状态的推测在调度器端进 行,而推测依据仅是表示粗粒度资源单位的数字。本文继而提出一种在工作 者端基于经验推测的资源状态表示方法,并开发了调度器原型Sparkle。实验证 明Sparkle有效缓解了由资源抽象带来的决策不精确问题。

3. 在基本作业调度部分,对取样调度中因次优决策问题引起的调度精确度问题进 行研究:本文提出了私有集群状态方法,通过各调度器自行积累与分析,以非协 同、低成本、无额外模块部署的方式获得全局资源状态并规避次优决策。本文依 照该方法在开源工程Sparrow的基础上改进开发了调度器PCSsampler,实验结果 表明PCSsampler能够有效改进次优决策问题,相对Sparrow具有显著更短的作业 执行时间。

4. 在高级调度特性支持部分,对取样调度支持全局调度考量的问题进行研究:本 文以组调度为例,提出了基于私有集群状态的组调度逻辑,并提出该方式在其他全局调度考量中的适用性。在不同工业负载下的实际验证结果表明,该方法的组调度成功率可以在大部分情况下接近理论最优值。

5. 最后,本文对取样调度本身的方法边界进行研究,即对取样调度的延迟下限进行深入探索:本文基于对取样调度三次通信过程的分析,按照取样比率为1时的 特例状态提出了一种基于随机游走的低延迟集群调度方法。研究中该方法通过 工程Tiresias进行实现。亚马逊集群以及模拟集群运行结果都说明Tiresias有效降 低了现有取样方法的延迟下限,并很可能是目前大数据背景下必要延迟最低的 集群调度方法之一。

;

Recent years, as the rapid development of big data, the size of data analytical cluster has become ever larger. Today clusters contain thousands of node are not uncommon, some even have tens of thousand of nodes. Such size posts significant challenge towards cluster scheduling: the most commonly seen centralized scheduling method has shown its limitations in various aspect, such as lacking scalability and long scheduling delay, hence might becomes potential bottleneck. More importantly, the complex centralized decisioning logic can hardly serve the need of interactive jobs, which are crucial in today’s big data context. As a result, par- ticipator has search for alternative scheduling methods, among which sample based scheduling method is one of the most promising.

The advantage of sample based scheduling is two-fold: first, its scalable design made it convenient to accommodate today or near future’s large cluster and workload size; second, its simple and fast decision logic is especially suitable for serving high-fanout interactive work- load. Consequently, many cluster have adopted such approach, especially in Spark platform. However, as an emerging method, it still has some major problems, including: (1) the sampling process confines the method’s precision; (2) it now could not support some important global scheduling concern; (3) its necessary scheduling delay is consist of mostly three sequential communication.

This thesis aims to mitigate above mentioned problems of sample based scheduling, while preserving its high scalability and low-latency characteristic. Specifically, the contributions of this thesis are as follows:

1.Conductashortsurveyofcurrentrepresentativeclusterschedulingresearch,guidingthe research of sample based scheduling in this thesis through quantitative results acquired from the survey.

2. Introduce Sparkle. Sparkle mitigates the precision problem of sample based schedul- ing by a fine-grained, speculation based resource abstraction technology. Sparkle move the responsibility of predicting the status of worker from cluster to worker end, reduc- ing mismatched information; it also introduce worker blacklist to temporary block busy worker, if the worker itself suggest unavailability.Sparkle has shown to effectively im- prove job delay comparing.

3. Introduce PCSsample. PCSsampler is an enhanced version of sample based scheduling using private-cluster-state(PCS). PCS convert sample based scheduling from stateless process to state-based process. It need scheduler to keep caching information at each decision, forming an approximation of real-time cluster state, PCS. The design of PC- Ssample make sure the half-accurate half-expired information in PCS could lead to use- ful scheduling suggestions. Experiment shows PCSsample significantly cut sub-optimal scheduling decision in sample based scheduling and improve scheduling precision.

4. Improve the support of gang scheduling in sample based method based on PCS. Intro- duce novel gang scheduling process with PCS that could make sure gang scheduling similar to omnipotent, centralized scheduler without preemption.

5. Introduce Tiresias. Tiresias is a low-latency cluster scheduling method that explore the lower bound of delay of cluster scheduling. It is based on task placement logic similar to random walk. As a result, in some context it need only one necessary communication before a task is placed. Experiment results support its claim on lower scheduling delay comparing to sample based method in most context.

Language中文
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/19004
Collection基础软件国家工程研究中心
Affiliation中国科学院软件研究所
Recommended Citation
GB/T 7714
郝春亮. 大数据背景下集群取样调度关键技术研究[D]. 北京. 中国科学院大学,2017.
Files in This Item:
File Name/Size DocType Version Access License
thesis-4.pdf(9313KB)学位论文 开放获取CC BY-NC-SAApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[郝春亮]'s Articles
Baidu academic
Similar articles in Baidu academic
[郝春亮]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[郝春亮]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.