中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 人机交互技术与智能信息处理实验室  > 学位论文
题名:
实时数据流系统降载方法研究
作者: 马力
答辩日期: 2009-01-18
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 数据流 ; 数据流管理系统 ; 实时调度 ; 降载 ; 语义降载 ; 随机降载 ; 滑动窗口 ; 共享窗口连接 ; 优先级表
其他题名: Load Shedding in a Real-time Data Stream System
摘要: 随着信息技术的发展,出现了大量的数据流应用,如传感器数据处理、网络监控、金融数据分析等。在这些应用中,数据是一种连续、时变、有序、无限的序列,查询大多数都是采用连续查询方式。这种数据和查询的连续性对管理系统的资源需求很大。当系统资源不能满足查询处理要求,即查询处理任务超过系统的最大处理能力,用户的查询将得不到及时、正确地处理。同时,如果查询处理时间超过了查询截止期需求,查询结果就没有意义,甚至会造成灾难性后果。目前,很多研究都集中于数据流系统的降载,对支持实时查询处理的实时数据流系统降载的研究比较少。 本论文主要研究支持实时查询处理的实时数据流管理系统中的降载方法,主要包括如下三个方面:随机降载方法、语义降载方法与共享滑动窗口连接操作的降载方法。最后,通过实时数据流管理系统测试平台验证了所提出算法在提高系统吞吐量与降低截止期错失率方面表现出良好的性能。 针对实时数据流应用需求,提出了一种适合实时查询的数据流处理框架结构RT-DSPA和相应的多层过载处理策略MLOHS,为降载方法的研究提供一个框架基础。RT-DSPA分为用户层、DSMS层以及数据源层多个功能模块,具有多层性、可扩展性、健壮性以及可配置性的特点。 在随机降载方面,提出了一种基于数据流流速的负载估计算法;在实时数据流处理框架与负载估计算法的基础上,提出了一种截止期敏感的随机降载算法RLS-EDA。由于系统负载经常波动较大,该算法利用截止期的特点,使用暂存所丢弃元组技术充分地利用CPU空闲资源,使降载执行后系统的吞吐量得到提高,进而尽可能地降低查询截止期错失率;最后,讨论了降载过程中的队列维护策略、含共享操作符查询网络中的降载位置以及降载操作符插入查询网络的算法。实验结果表明,在系统负载波动较大的情况下,RLS-EDA算法表现出良好的性能。 在充分了解数据流及查询特征的情况下,语义降载表现出更好的降载效果。为明确语义降载时使用到的语义,提出了元组价值、价值等级的概念,给出价值等级划分时发生冲突的解决方法。设计了适合实时数据流管理系统的价值等级–执行开销优先级表和截止期–价值密度优先级表,其在确定优先级时可考虑多维因素。基于这两种优先级表设计,提出了相对应的语义降载算法SLS-PT-VD&EC和SLS-PT-D&TVD。基于优先级表的语义降载算法能够灵活地满足用户的不同需求,同时提高系统降载时的性能。 最后,针对共享滑动窗口连接操作符的过载情况,利用查询截止期的特点,提出了一种基于暂存丢弃元组技术的共享滑动窗口连接的降载算法LS-SJRT;为减小LS-SJRT算法的降载开销,提出了一种改进的基于调节滑动窗口宽度的共享滑动窗口连接降载算法LS-SJRT-CW。实验结果显示这两种算法在共享连接操作符过载时都能够表现出较好的性能。
英文摘要: With the development of information technology, there has been a large amount of data stream applications such as sensor data processing, network monitoring, analysis of financial data, and so on. In these applications, data stream is a kind of continuous, orderly, time-varying, infinite sequence, and most of queries adopt continuous query processing. Processing these continuous data and continuous queries requires a great deal of system resources. When system resources can not meet the requirements of queries processing, that is, queries processing requirements surpass the maximum capacity of the system, these queries will not be dealt with correctly as well as timely manner. At the same time, if processing time of the query is longer than deadline of this query, query results will be meaningless, and even lead to disastrous consequences. At present, many studies focus on load shedding in a data stream management system, and relatively few studies consider load shedding in a real-time data stream management system where queries have deadline requirements. This dissertation mainly discusses load shedding approaches in real-time data stream system which supports real-time queries, including the following three areas: random load shedding approaches, semantic load shedding approaches and load shedding approaches for shared sliding window joins. Experimental results show that our proposed algorithms have better performance in increasing system throughput and decreasing query deadline miss ratio. For real-time data stream applications, this dissertation proposes a Data Stream Processing Architecture for Real-time query (RT-DSPA) and a Multi-Level Overload Handling Strategy (MLOHS) to provide a framework basis for these load shedding approaches. The RT-DSPA is divided into the user layer, DSMS layer as well as data source layer, which is scalable, robust and configurable. As for random load shedding, a load estimation algorithm based data stream rates is firstly proposed, and then an Effective Deadline-Aware Random Load Shedding (RLS-EDA) based on the proposed architecture and load estimation algorithm is presented. The RLS-EDA takes into account the larger fluctuations of data streams, and makes full use of the characteristic of deadline, and adopts the temporary buffer technology to make full use of idle system resources. The objective of this algorithm is to increase system throughput as much as possible to decrease query deadline miss ratio. Finally, this dissertation discussed the strategy of maintaining queues, load shedding location in query network of shared operators and the algorithm of inserting load shedder into query network. Experimental results show that the proposed random load shedding algorithm has favorable performance during larger load fluctuations. Under understanding the characteristics of data streams and queries, a semantic load shedding algorithm shows better performance. By defining tuple’s value and tuples’ value degree, we give a conflict solution when dividing tuples’ value degree. To better meet the requirements of users, multi-dimensional factors are considered when dropping redundant tuples, and the tuples’ value degree-execution cost priority table and the deadline-tuple’s value density priority table are designed for real-time data stream system. Based on these priority tables, the corresponding semantic load shedding algorithm SLS-PT-VD & EC and SLS-PT-D & TVD are proposed. These algorithms are flexible to different requirements of users, and provide different load shedding goals. Finally, this dissertation proposes a Load Shedding algorithm for Shared window Join over Real-Time data streams (LS-SJRT) based on buffering dropped tuples by using the characteristics of query deadline. To reduce the overhead of the LS-SJRT, an improved load shedding algorithm LS-SJRT-CW is presented by adjusting the sizes of sliding windows. Experimental results show that these two algorithms have better performance for shared window joins in real-time data stream system.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/5886
Appears in Collections:人机交互技术与智能信息处理实验室_学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200518015029034马力_paper.pdf(1352KB)----限制开放-- 联系获取全文

Recommended Citation:
马力. 实时数据流系统降载方法研究[D]. 软件研究所. 中国科学院软件研究所. 2009-01-18.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[马力]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[马力]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace