ISCAS OpenIR  > 天基综合信息系统全国重点实验室
数据流决策树分类算法的研究与应用
侯旭珊
Major计算机应用技术
Supervisor吕品
2014-05-26
Degree Grantor中国科学院大学
Degree Level硕士
Place of Degree Grantor北京
Keyword数据流 分类 决策树 缺失值 概念漂移 模拟推演系统
English Abstract

随着大数据时代的到来,数据密集型系统得到了广泛应用,这些系统连续不断地产生高速的数据流。如何从数据流中挖掘出有价值的信息,已成为数据挖掘领域新的研究热点。与传统的静态数据相比,数据流是动态产生的,具有实时、海量、连续以及高速变化等特点,这些特点给数据流挖掘的研究工作带来了巨大的挑战。

数据流分类是数据流挖掘中重要的研究方向,它在网络入侵检测和信用卡欺诈等很多方面得到了实际应用。而实际中的数据流会因网络传输故障等原因造成数据缺失,也会随着时间的变化发生概念漂移,因此本文主要研究数据流决策树分类中的缺失值处理和自适应概念漂移问题,并研究其在模拟推演系统的数据分析中的应用。

首先,本文调研了数据流决策树分类算法的研究现状,分析了模拟推演系统对数据流分类的需求。通过详细分析数据流决策树分类中的经典算法,指出了经典算法中存在的问题。

其次,针对数据流中的数据缺失问题,本文提出了一种自适应处理缺失数据流的高效决策树算法。通过自适应选择缺失值处理方法,采用改进的贝叶斯分类器,并优化更新机制,提升算法的时间性能。仿真实验结果表明,本文算法在保持与现有缺失值处理算法的分类准确率相同的情况下,算法的时间性能提高了20%70%

再次,针对数据流中的概念漂移现象,本文提出了一种基于多窗口机制的自适应概念漂移算法。通过自适应确定滑动窗口的大小,增强算法对概念漂移的适应能力,并改进建立候选节点的机制,降低算法的运行时间。仿真实验结果表明,本文算法比现有概念漂移算法具有更强的概念漂移适应能力和更短的运行时间。

最后,本文根据模拟推演系统对数据流分类的需求,设计了应用于模拟推演中的数据流分类系统。应用本文提出的缺失值处理算法和自适应概念漂移算法,能够对模拟推演中的数据流进行分类挖掘,为模拟推演过程提供决策支持。;

Many application systems today generate continuous data stream. It has become a new research direction in Data mining that how to mining information from data stream. Data stream is real-time, massive, continuous and rapid, which brings a huge challenge to the research of Data stream mining.

Data stream classification is an important research in Data stream mining, which has been applied in network intrusion detection, credit card fraud and many other areas. Data streams in the actual always have missing values and concept drift. Therefore, this paper studies how to deal with missing values and concept drift in data stream decision tree classification, and how to apply these algorithms in the wargame.

Firstly, this paper investigates the research status on data stream decision tree classification, and analyzes the requirements of the wargame in data stream classification. This paper analyzes the detailed of the classic algorithms in data stream decision tree classification, and points out the problems in these algorithms.

Secondly, this paper presents an efficient algorithm for missing values in data stream decision tree classification(EAM) to avoid the impact of missing values. EAM selects method for missing values adaptively, and uses an improved Bayesian classifier, and optimizes the update mechanism, which can improve the time performance. The experiment results show that the run-time of EAM is reduced by 20%-70%, while the accuracy is the same as the existing algorithm.

Thirdly, this paper presents a concept-adapting algorithm based multi-windows in data stream decision tree classification(CAMW) to adapt the concept drift in the data stream. CAMW chooses the size of the sliding window adaptively and enhances the ability to adapt the concept drift. Also, CAMW improves the mechanism to create the candidate nodes and reduces its time complexity. The experiment results show that CAMW has greater ability for concept drift and lower run-time than the existing algorithm on concept drift.

Finally, this paper designs the data stream classification system for the wargame, according to the requirements of the wargame in the data stream classification. The data stream classification system uses the algorithms in this paper to classify the data streams of the wargame and supports the decision for the wargame.
Subject计算机应用
Language中文
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/16391
Collection天基综合信息系统全国重点实验室
Recommended Citation
GB/T 7714
侯旭珊. 数据流决策树分类算法的研究与应用[D]. 北京. 中国科学院大学,2014.
Files in This Item:
File Name/Size DocType Version Access License
硕士学位论文-侯旭珊.pdf(2133KB) 开放获取LicenseApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[侯旭珊]'s Articles
Baidu academic
Similar articles in Baidu academic
[侯旭珊]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[侯旭珊]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.