中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 综合信息系统技术国家级重点实验室  > 学位论文
Subject: 计算机应用
Title:
数据流决策树分类算法的研究与应用
Author: 侯旭珊
Issued Date: 2014-05-26
Supervisor: 吕品
Major: 计算机应用技术
Degree Grantor: 中国科学院大学
Place of Degree Grantor: 北京
Degree Level: 硕士
Keyword: 数据流 ; 分类 ; 决策树 ; 缺失值 ; 概念漂移 ; 模拟推演系统
Abstract:

随着大数据时代的到来,数据密集型系统得到了广泛应用,这些系统连续不断地产生高速的数据流。如何从数据流中挖掘出有价值的信息,已成为数据挖掘领域新的研究热点。与传统的静态数据相比,数据流是动态产生的,具有实时、海量、连续以及高速变化等特点,这些特点给数据流挖掘的研究工作带来了巨大的挑战。

数据流分类是数据流挖掘中重要的研究方向,它在网络入侵检测和信用卡欺诈等很多方面得到了实际应用。而实际中的数据流会因网络传输故障等原因造成数据缺失,也会随着时间的变化发生概念漂移,因此本文主要研究数据流决策树分类中的缺失值处理和自适应概念漂移问题,并研究其在模拟推演系统的数据分析中的应用。

首先,本文调研了数据流决策树分类算法的研究现状,分析了模拟推演系统对数据流分类的需求。通过详细分析数据流决策树分类中的经典算法,指出了经典算法中存在的问题。

其次,针对数据流中的数据缺失问题,本文提出了一种自适应处理缺失数据流的高效决策树算法。通过自适应选择缺失值处理方法,采用改进的贝叶斯分类器,并优化更新机制,提升算法的时间性能。仿真实验结果表明,本文算法在保持与现有缺失值处理算法的分类准确率相同的情况下,算法的时间性能提高了20%70%

再次,针对数据流中的概念漂移现象,本文提出了一种基于多窗口机制的自适应概念漂移算法。通过自适应确定滑动窗口的大小,增强算法对概念漂移的适应能力,并改进建立候选节点的机制,降低算法的运行时间。仿真实验结果表明,本文算法比现有概念漂移算法具有更强的概念漂移适应能力和更短的运行时间。

最后,本文根据模拟推演系统对数据流分类的需求,设计了应用于模拟推演中的数据流分类系统。应用本文提出的缺失值处理算法和自适应概念漂移算法,能够对模拟推演中的数据流进行分类挖掘,为模拟推演过程提供决策支持。
English Abstract:

Many application systems today generate continuous data stream. It has become a new research direction in Data mining that how to mining information from data stream. Data stream is real-time, massive, continuous and rapid, which brings a huge challenge to the research of Data stream mining.

Data stream classification is an important research in Data stream mining, which has been applied in network intrusion detection, credit card fraud and many other areas. Data streams in the actual always have missing values and concept drift. Therefore, this paper studies how to deal with missing values and concept drift in data stream decision tree classification, and how to apply these algorithms in the wargame.

Firstly, this paper investigates the research status on data stream decision tree classification, and analyzes the requirements of the wargame in data stream classification. This paper analyzes the detailed of the classic algorithms in data stream decision tree classification, and points out the problems in these algorithms.

Secondly, this paper presents an efficient algorithm for missing values in data stream decision tree classification(EAM) to avoid the impact of missing values. EAM selects method for missing values adaptively, and uses an improved Bayesian classifier, and optimizes the update mechanism, which can improve the time performance. The experiment results show that the run-time of EAM is reduced by 20%-70%, while the accuracy is the same as the existing algorithm.

Thirdly, this paper presents a concept-adapting algorithm based multi-windows in data stream decision tree classification(CAMW) to adapt the concept drift in the data stream. CAMW chooses the size of the sliding window adaptively and enhances the ability to adapt the concept drift. Also, CAMW improves the mechanism to create the candidate nodes and reduces its time complexity. The experiment results show that CAMW has greater ability for concept drift and lower run-time than the existing algorithm on concept drift.

Finally, this paper designs the data stream classification system for the wargame, according to the requirements of the wargame in the data stream classification. The data stream classification system uses the algorithms in this paper to classify the data streams of the wargame and supports the decision for the wargame.
Language: 中文
Content Type: 学位论文
URI: http://ir.iscas.ac.cn/handle/311060/16391
Appears in Collections:综合信息系统技术国家级重点实验室 _学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
硕士学位论文-侯旭珊.pdf(2133KB)----限制开放 联系获取全文

Recommended Citation:
侯旭珊. 数据流决策树分类算法的研究与应用[D]. 北京. 中国科学院大学. 2014-05-26.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[侯旭珊]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[侯旭珊]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace