中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
知识发现的图模型方法
作者: 李刚
答辩日期: 2001
专业: 计算机软件与理论
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 图模型 ; 有向图模型 ; 知识发现 ; 概率依赖关系计算智能
摘要: 作为概率论和图论相结合的产物,图模型理论为解决应用数学和工程中的不确定性、复杂性问题提供了直观而自然的方法。近年来它逐步成为数据发掘与知识发现领域中一个不容忽视的方向。根据“基于有向图模型的知识发现”框架,本方研究图模型在知识发现领域的应用理论基础,包括离散化预处理、结构学习、参数学习、模型解释等四个方面。首先,本文分别提出了连续数值属性的无监督离散化算法和有监督离散化算法。无监督离散化算法基于混合概率模型,它能够在缺少先验知识和参考属性时自动离散化数值区间:首先建立属性值的混合概率模型,然后采用EM算法来确定该模型的参数,最后利用贝叶斯因子寻求最佳区间数目。有监督离散化算法-加权信息损耗离散化算法,是决策树离基化算法的一种扩展,但采用了ChiMerge算法中的自底向上离散化方式。然后,本文从概率密度函数逼近的角度对有向图模型的结构学习进行了理论分析,归纳出“最大相互信息原则”,分析了运用该原则进行结构学习时的性质,并提出了“附加惩罚出数的最大相互信息原则”,进而提出了有向图模型结构学习的演化算法,该算法可以结合两类先验知识以提高学习效率,并设计了一系列修正算子,以保证由已有拓扑结构繁衍出的新结构仍然是符合要求的拓扑结构,而且不违背先验知识。针对有向图模型的参数学习,本文提出基于复合计算智能的方法,设计了各节点处条件概率密度的人工神经网络表示方法,使得参数学习时不再要求参数满足局部无关性,也不再需要用户指定先验参数,进而提出了该人工神经网络的演化训练算法,从而确定有向图模型各节点自参数的值。此外,本文还探讨了有向图模型的模型解释问题,分别提出了概率依赖关系描述、条件独立关系的自然语言描述方法。最后,本方介绍了一个概率依赖关系发砚系统原型工具Dr.Miner的设计和实现。
英文摘要: Graphical model is developed by the integration of probability theory with graph theory. It provides a natural tool for dealing with two problems, uncertainty and complexity, in applied mathematics and engineering. In particular, it plays an increasingly important role in the field of Data Mining and knowledge Discovery. This paper presents an in-depth exploration of both theoretical and practical issues related to Graphical Model for KDD, including discretization, structure learning, parameter learning and model explanation. Firstly, an unsupervised algorithm and a supervised algorithm are proposed to discretize the original data set. The unsupervised algorithm is based on mixture probabilistic models, it can automatically divide the range of specified attribute into intervals without prior knowledge of referencing attributes. A mixture probabilistic model with all the attribute values is set up first; it follows the determination of parameters for this model by using the EM algorithm; finally, the optimal number of intervals is found by use of the Bayes Factor. The supervised algorithm WILD, Weighted Information Loss Dsicretization, can be considered as an extension of Decision Tree Discretization algorith, but uses a bottom-up paradigm as in ChiMerge algorithm. Secondly, this paper formulates the structure learning of Directed Graphical Model as determining the structure that best approximates the probability distribution indicated by the data. Maximum Mutual Information Metric is summarized and analyzed, and a new metric, Penalized Mutual Information metric, is proposed, then an evolutionary algorithm is proposed to search the best structure among alternatives. Two kinds of prior knowledge are incorporated to improve the efficiency. Several repair operators are designed to assure that each structure generated during the evolution is a valid DAG, and do not violate the prior knowledge. Thirdly, this dissertation proposed a Hybrid Computational Intelligence approach to learn the parameters of Directed Graphical Model. Firstly, an artificial neural network to used to represent the local conditional probability distribution between a node and its parents, this representation not only avoids the need of prior parameters when parameter learning, but also evades the local independence assumption between parameters. Then, an evolutionary algorithm is designed to train the neural network, the parameters of Directed Graphical Model can be induced from the trained neural network related to each node.. Following that, the problem of model explanation is discussed. Several approaches for the interpretation of the probabilistic dependency relation and conditional independence relation in natural language are proposed. Finally, the design and implementation of a propotype system, Dr. Miner, are described.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/7598
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW004421.pdf(1726KB)----限制开放-- 联系获取全文

Recommended Citation:
李刚. 知识发现的图模型方法[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2001-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[李刚]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[李刚]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace