大数据背景下集群取样调度关键技术研究

	大数据背景下集群取样调度关键技术研究
其他题名	Research on Key Technologies of Sample Based Scheduling under Big Data Context
	郝春亮
专业	计算机软件与理论
导师	李明树
	2017-05-22
学位授予单位	中国科学院大学
学位	博士
学位授予地点	北京
关键词	集群调度取样方法调度精确度大数据交互作业随机游走
其他摘要	近年来，随着大数据环境的快速发展，数据处理集群的规模在不断增长。目前由几千台服务器组成的数据处理集群已非罕见，部分领军企业的集群甚至包含上万台服务器。在这些大规模集群中，调度方法面临严峻挑战:传统的使用集中决策逻辑的调度方法因不可并行扩展、高调度延迟等特点，面临成为性能瓶颈的风险;同时由于需要进行全局同步，集中逻辑的调度方法在目前的集群规模下难以向交互作业提供其所需的毫秒级延迟。因此研究者开始转向探索其他可能的调度方法，取样调度就是其中的研究热点之一。取样调度方法的核心优势在于其全分布式结构、简单的调度过程以及极低的调度延迟下限，这些特性理论上更适合当前大规模集群环境的作业需求。目前该方法多服务于以Spark为代表的快速数据处理框架。取样调度包含三个主要部分，即:集群资源抽象、基本作业调度以及高级调度特性支持。目前作为一个较新的研究方向，集群取样调度方法在其三个主要部分中都有尚待完善的技术难点，限制了该方法目前在实际集群中的适用性。本文对取样方法分别进行深入研究分析，在保留取样方法中高可扩展和低延迟特性的同时对取样方法进行改进。通过改进，本论文中初步形成了一套低成本、高精确度的取样调度方法。具体来说，本文的主要内容和贡献如下: 1. 对目前大数据背景下的集群调度方法进行经验分析:归纳目前集群调度的主要问题、关注重点以及发展方向，为本文的取样调度研究作为方向指引。 2. 在资源抽象部分，对取样调度中因资源表示缺陷引起的调度精确度问题进行研究:本文认为该问题的原因在于，所有针对资源状态的推测在调度器端进行，而推测依据仅是表示粗粒度资源单位的数字。本文继而提出一种在工作者端基于经验推测的资源状态表示方法，并开发了调度器原型Sparkle。实验证明Sparkle有效缓解了由资源抽象带来的决策不精确问题。 3. 在基本作业调度部分，对取样调度中因次优决策问题引起的调度精确度问题进行研究:本文提出了私有集群状态方法，通过各调度器自行积累与分析，以非协同、低成本、无额外模块部署的方式获得全局资源状态并规避次优决策。本文依照该方法在开源工程Sparrow的基础上改进开发了调度器PCSsampler，实验结果表明PCSsampler能够有效改进次优决策问题，相对Sparrow具有显著更短的作业执行时间。 4. 在高级调度特性支持部分，对取样调度支持全局调度考量的问题进行研究:本文以组调度为例，提出了基于私有集群状态的组调度逻辑，并提出该方式在其他全局调度考量中的适用性。在不同工业负载下的实际验证结果表明，该方法的组调度成功率可以在大部分情况下接近理论最优值。 5. 最后，本文对取样调度本身的方法边界进行研究，即对取样调度的延迟下限进行深入探索:本文基于对取样调度三次通信过程的分析，按照取样比率为1时的特例状态提出了一种基于随机游走的低延迟集群调度方法。研究中该方法通过工程Tiresias进行实现。亚马逊集群以及模拟集群运行结果都说明Tiresias有效降低了现有取样方法的延迟下限，并很可能是目前大数据背景下必要延迟最低的集群调度方法之一。 ; Recent years, as the rapid development of big data, the size of data analytical cluster has become ever larger. Today clusters contain thousands of node are not uncommon, some even have tens of thousand of nodes. Such size posts significant challenge towards cluster scheduling: the most commonly seen centralized scheduling method has shown its limitations in various aspect, such as lacking scalability and long scheduling delay, hence might becomes potential bottleneck. More importantly, the complex centralized decisioning logic can hardly serve the need of interactive jobs, which are crucial in today’s big data context. As a result, par- ticipator has search for alternative scheduling methods, among which sample based scheduling method is one of the most promising. The advantage of sample based scheduling is two-fold: first, its scalable design made it convenient to accommodate today or near future’s large cluster and workload size; second, its simple and fast decision logic is especially suitable for serving high-fanout interactive work- load. Consequently, many cluster have adopted such approach, especially in Spark platform. However, as an emerging method, it still has some major problems, including: (1) the sampling process confines the method’s precision; (2) it now could not support some important global scheduling concern; (3) its necessary scheduling delay is consist of mostly three sequential communication. This thesis aims to mitigate above mentioned problems of sample based scheduling, while preserving its high scalability and low-latency characteristic. Specifically, the contributions of this thesis are as follows: 1.Conductashortsurveyofcurrentrepresentativeclusterschedulingresearch,guidingthe research of sample based scheduling in this thesis through quantitative results acquired from the survey. 2. Introduce Sparkle. Sparkle mitigates the precision problem of sample based schedul- ing by a fine-grained, speculation based resource abstraction technology. Sparkle move the responsibility of predicting the status of worker from cluster to worker end, reduc- ing mismatched information; it also introduce worker blacklist to temporary block busy worker, if the worker itself suggest unavailability.Sparkle has shown to effectively im- prove job delay comparing. 3. Introduce PCSsample. PCSsampler is an enhanced version of sample based scheduling using private-cluster-state(PCS). PCS convert sample based scheduling from stateless process to state-based process. It need scheduler to keep caching information at each decision, forming an approximation of real-time cluster state, PCS. The design of PC- Ssample make sure the half-accurate half-expired information in PCS could lead to use- ful scheduling suggestions. Experiment shows PCSsample significantly cut sub-optimal scheduling decision in sample based scheduling and improve scheduling precision. 4. Improve the support of gang scheduling in sample based method based on PCS. Intro- duce novel gang scheduling process with PCS that could make sure gang scheduling similar to omnipotent, centralized scheduler without preemption. 5. Introduce Tiresias. Tiresias is a low-latency cluster scheduling method that explore the lower bound of delay of cluster scheduling. It is based on task placement logic similar to random walk. As a result, in some context it need only one necessary communication before a task is placed. Experiment results support its claim on lower scheduling delay comparing to sample based method in most context.
语种	中文
内容类型	学位论文
URI标识	http://ir.iscas.ac.cn/handle/311060/19004
专题	基础软件国家工程研究中心
作者单位	中国科学院软件研究所
推荐引用方式 GB/T 7714	郝春亮. 大数据背景下集群取样调度关键技术研究[D]. 北京. 中国科学院大学,2017.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
thesis-4.pdf（9313KB）	学位论文		开放获取	CC BY-NC-SA	请求全文