中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
基于数据挖掘的案例库维护方法研究
作者: 李雄锋
答辩日期: 2003
专业: 计算机软件与理论
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 数据挖掘 ; 基于案例的推理 ; 案例库维护
其他题名: Case Base Maintenance Based on Data Mining
摘要: 作为从大规模数据中发现隐含知识的有效技术,数据挖掘以及相关算法的研究在近年来引起了人们广泛的兴趣,并被应用到大量的相关领域。同时,随着基于案例的推理(CBR,Case-Based Reasoning)系统在组织知识管理中的广泛应用,案例库的规模也在不断地膨胀,引发了人们对案例库维护的关注。如何采取合适的策略和技术来提高案例的质量,改进案例库访问性能,提高CBR系统的效率和能力成为人们关注的焦点。基于这个背景,本文以组织中案例库维护的实际需求为基础,从提高案例质量和改进访问性能两个角度出发,研究案例库以及案例访问记录的数据挖掘技术,以及案例库维护技术。为了支持对案例库的挖掘,本文在面向对象的案例表示的基础上,提出了一种基于加权特征矢量的案例表示方法。以此为基础,对现在的领域无关的案例相似性比较算法进行改进,在数据挖掘中基于该算法建立案例特征的描述模型。此外,本文还分析了案例记录的内容和表示方式,从访问事务和访问时序两个视角来研究案例记录中蕴含的动态的案例访问描述模型。这些工作为案例库维护提供了技术基础。在对案例特征和案例访问记录进行数据挖掘的基础上,本文从内容维护和性能维护两个方面研究案例库的维护方法。在内容维护方面,以提高案例库的案例质量为目的,本文研究了利用孤立点分析检测不一致案例、利用分类技术完善不完整案例、利用聚类技术检测冗余案例以及利用趋势分析检测垃圾案例等四种维护技术。在性能维护部分,以提高对案例库的访问速度为目标,本文利用数据挖掘改进了现在常用的案例库分层算法,并提出了对频繁使用的案例进行缓存、以及对经常同时访问的案例进行预取两种方法。论文通过数据挖掘技术来解决CBR系统中关于案例库维护的问题。但是,论文所讨论的方法与技术并不局限于CBR系统,对于各类知识管理系统所需要维护的组织知识资产库,本文的研究工作都具有一定的借鉴意义。
英文摘要: As an effective technique to nontrivially extract previously unknown and potentially useful information from very large amount of data, Data Mining and relative algorithms have been on broad research recently and got a wide range of applications. With the proliferation of Case-Based Reasoning (CBR) system in organizational knowledge management, many case bases are now becoming an unwieldy legacy system, which has called for the intention of case base maintenance. It focuses on adoption of proper techniques and policies to improve case quality, boost access performance, and increase the efficiency and competence of the CBR systems. Therefore, with the background of real-world requirement, the thesis details on how to maintain the case bases from the viewpoint of case quality and access performance by applying Data Mining in case bases and case access logs. To support Data Mining in case bases, the thesis proposes a case representation based on traditional object-oriented representation. A case is represented by a weighted feather vector. With this representation, an improved algorithm for case similarity measurement is brought up here. It is used in the Data Mining algorithms for modeling the case feathers. Further more, the content and representation of case access logs are also addressed here. The implicit dynamic case access models are studied from viewpoints of access transactions and access sequences. All of above are the base for case base maintenance techniques. The research of case base maintenance techniques in this thesis is carried out from two research perspectives: content maintenance and performance maintenance. The underlying technique is Data Mining in case bases and case access logs. The purpose of content maintence is to improve the quality of cases. In this part, such techniques are proposed as finding inconsistent cases based on outlier detection algorithm, fulfilling incomplete cases based on classification algorithm, detecting redundant cases based on clustering algorithm, and discovering Spam cases by performing trend movement analysis on case base access logs. The purpose of performance maintenance is to improve the access velocity. In this part, the following techniques are specified: caching frequently accessed cases, partitioning the case base to limit the accessible case number for every CBR circle, and pre-fetching cases that are usually accessed together in the CBR circles. As for every technique, its application, realization, algorithm and effectiveness are analyzed. And the maintenance policy for the solutions based on these algorithms is also brought out. The case base maintenance problems in the CBR systems are solved by Data Mining techniques in this thesis. But the methods and policies discussed in this thesis are not confind to CBR systems. It is hoped the discussion here could be helpful for the .maintenance of knowledge bases in knowledge management systems.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/6040
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW011214.pdf(2759KB)----限制开放-- 联系获取全文

Recommended Citation:
李雄锋. 基于数据挖掘的案例库维护方法研究[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2003-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[李雄锋]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[李雄锋]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace