中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
数据流管理系统中查询处理和实时调度技术研究
作者: 李新
答辩日期: 2008-01-16
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 数据流 ; 连续查询 ; 周期性查询 ; 一次查询 ; 实时查询 ; 滑动窗口 ; 查询处理 ; 查询优化 ; 实时调度 ; 反馈控制 ; 截止期
其他题名: Query Processing and Real-time Scheduling In a Data Stream Management System
摘要: 随着信息处理技术在通信、金融、工业生产等领域的广泛应用,数据已不仅仅拘泥于文件、数据表等传统的静态形式。大量连续、变化的流式数据已经在越来越多的现代应用中出现,如:军事指挥、交通控制、传感器数据处理、网络监控、金融数据分析等。在这些应用中,数据以流的形式不断到达,系统需要对这些数据进行及时、连续的处理。如果查询处理时间超过了截止期要求,查询结果就没有意义,甚至会造成灾难性后果。目前大多数研究集中于连续查询的处理与调度,对于以满足截止期为目标的实时查询处理与调度研究比较少,因此本文主要讨论数据流管理系统中实时查询处理问题。 本文首先系统地分析了数据流系统的应用特点、需求和研究现状,在此基础上提出了一种由周期性查询、连续查询和一次查询构成的混合实时查询模型,对各种查询的定义和属性进行了形式化描述。为了方便查询的定义与表示,本文设计了一种类SQL的实时连续查询语言RT-CQL(Real-Time Continuous Query Language)。与CQL语言相比,RT-CQL增加了对周期、截止期、采样率等实时性和近似性特征的描述。 其次,本文结合实时查询的特点,给出了一种实时查询处理框架,讨论了选择、投影、连接等基本操作符的设计与实现。针对多查询共享问题,本文提出了一种带共享段的操作符路径图构造算法。该算法可以根据多个查询计划构造包含公共操作符路径段的操作符路径图,消除其中重复的操作符,提高查询处理效率。 接着,本文针对共享滑动窗口连接问题提出一种相对截止期优先连接算法RDF-SJoin(Relative Deadline First Shared window Join)和一种绝对截止期优先连接算法EDF-SJoin(Earliest Deadline First Shared window Join)。RDF-SJoin算法优先进行相对截止期较短的窗口上的连接操作,缩短其响应时间。EDF-SJoin算法按照元组的绝对截止期从早到晚的顺序进行连接操作。仿真测试结果表明,这两种算法都能减少查询的截止期错失率。与RDF-SJoin算法相比,EDF-SJoin算法能更公平的执行共享连接操作,避免查询执行中的“饿死问题”。 再次,本文针对多查询的实时调度问题提出一种以操作符路径为单位的最早截止期优先调度算法OP-EDF(Operator-Path Earliest Deadline First),并从调度能力、响应时间、系统开销等方面对算法性能进行了分析。然后,针对流速爆发情况下OP-EDF算法系统开销较大的问题,提出两种改进的内部调度算法—OP-EDF-Batch和OP-EDF-Gate,实现了高效的元组批处理调度。实验结果表明两种改进的算法可以有效地应用于流速爆发环境中的数据流实时查询处理。 进一步,本文把反馈控制思想应用到混合查询调度中,提出了一种基于反馈控制的混合查询调度FC-TBS(Feedback Control Total Bandwidth Server)算法。该算法根据周期性查询和非周期性查询的负载情况和截止期错失率,动态分配处理器时间。试验结果表明,与其它混合调度算法相比,FC-TBS算法在保证周期性查询截止期的情况下能够降低非周期性查询的截止期错失率,并能通过自适应调节CPU利用率参数的方法提高查询处理系统的整体服务质量。 最后,设计开发了一个实时数据流管理原型系统RT-DSMS,用于相关策略与算法的性能评估。这个原型系统提供了良好的可扩展性与可配置性,支持对新的查询调度算法、负载管理策略的测试与分析。 以满足截止期和降低截止期错失率为目标的数据流管理系统研究具有较高的应用价值和良好的应用前景。本文的研究成果为进一步探讨实时查询处理与混合查询的调度,以及实际应用中的混合查询负载管理提供了良好的基础。
英文摘要: More and more modern applications need to process data streams that consist of unbounded data items generated in a continuous fashion. Examples of such applications include military locations monitoring, traffic control, sensor-based monitoring and online finance analysis. Such applications receive unbounded input streams that are processed against a set of pre-defined queries in a real-time manner. For these applications, queries have to be completed within certain deadlines for the results to be of full value. Traditional database management systems, which are designed to process snapshot queries over finite stored datasets, can not satisfy these new requirements. The demand for real-time streaming applications has attracted a great deal of research interests in a new class of systems called Data Stream Management Systems (DSMSs) from both the database and real-time computing communities. This thesis addresses several research challenges to build a real-time data stream management system, including real-time query model, query processing & optimization and real-time query scheduling. To our best knowledge, this is the first work that integrates and schedules periodic, continuous and one-time queries with deadlines in a data stream system. At fist, a mixed real-time query model composed of periodic, continuous and one-time queries with deadlines is presented to handle multiple queries with timing constraints. Extended from the Continuous Query Language (CQL), a Rea-Time Continuous Query Language (RT-CQL) is designed to define and describe the real-time queries. The main difference between the RT-CQL and the CQL is that the former has additional support for the properties of real-time queries, such as period, deadline and sample ratio. To address shared window join in a real-time query model, this thesis proposes a Relative Deadline First Shared window Join algorithm (RDF-SJoin) and an Earliest Deadline First Shared window Join algorithm (EDF-SJoin). The simulation results show that, both of the proposed algorithms can reduce the number of deadline violations of real-time continuous queries. Moreover, the EDF-SJoin can execute the join operation more fairly than RDF-SJoin and avoid “starvation problem”. And then, this thesis discusses real-time scheduling for continuous queries. The execution of one tuple passing through an operator path is modeled as a real-time task instance. A fine-grained scheduling strategy named OP-EDF (Operator-Path Earliest Deadline First) is proposed for real-time scheduling, which schedules the operator path with the earliest deadline of the waiting tuples at any time slot. The performance of the OP-EDF is analyzed from three aspects: schedulability, response time and system overhead. Furthermore, two improved batch scheduling algorithms, termed OP-EDF-Batch and OP-EDF-Gate, are introduced to decrease system overhead of the OP-EDF. The experimental results show that the improved scheduling algorithms are effective in real-time query processing for data streams with bursting arrival rates. Furthmore, to execute multiple classes of queries, an adaptive query scheduling, termed FC-TBS (Feedback-Control-based Total Bandwidth Server), is proposed in which the Earliest Deadline First strategy is used to schedule periodic queries and a bandwidth preserving strategy – Total Bandwidth Server – for continuous and one-time queries. A feedback control method is adopted to dynamically adjust the ratio of preserved resource usage referred to the recent history of query miss ratio. The objective of the FC-TBS scheduling is to guarantee deadlines of periodic queries and to reduce deadline miss ratio of aperiodic queries, while maximizing the overall query quality according to load management strategy. Experimental results show that the FC-TBS strategy is more effective than other mixed scheduling algorithms and can deal with workload fluctuations gracefully. At last, a DSMS prototype, named RT-DSMS, is developed as a testing system to evaluate the performance of the proposed query model and algorithms. This prototype can provide full extendibility and configurability to support the test and analysis of new real-time scheduling algorithms and load management strategies. The work in this thesis provides the foundation for further research on real-time query processing, scheduling and load management in actual application systems.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/5764
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200418015029057李新_paper.pdf(1290KB)----限制开放-- 联系获取全文

Recommended Citation:
李新. 数据流管理系统中查询处理和实时调度技术研究[D]. 软件研究所. 中国科学院软件研究所. 2008-01-16.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[李新]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[李新]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace