Title: | 多示例学习算法研究与应用 |
Author: | 蒋剑
|
Issued Date: | 2007-06-01
|
Degree Grantor: | 中国科学院软件研究所
|
Place of Degree Grantor: | 软件研究所
|
Degree Level: | 博士
|
Keyword: | 多示例学习
; 机器学习
; K近邻算法
; 消极学习
|
Abstract: | 作为基于传统监督学习和非监督学习之间的一种全新学习框架,多示例学习的主要任务是将一些包含多个示例的训练包分类。但是与传统监督学习不同是,在多示例学习中,训练样本并没有给出包中的示例的标记,而是仅仅给出了每个包的分类。因此,多示例学习要比传统的监督学习困难很多。本文根据多示例学习的数据特征,提出了一种基于k近邻算法的多示例学习算法,并且在基准数据集Musk上的实验结果显示该算法比以往的算法结果有明显的改善。此外,考察Musky数据,我们可以看到在所有166个属性中并不是所有的属性对于示例的分类具有相同的重要性,相反,有些属性和示例类别的关系并不大或者甚至是无关的,一个解决的办法就是使用一个在示例空间上变化的值伸展坐标轴,这里我们可以用不同属性的相关性的大小作为权值来伸展坐标轴。因此本文还考察了k近邻学习算法能否通过属性的选择产生更好的结果,结果显示通过特征选择使得原有算法的准确性得到了进一步的提高。最后,本文还同时对其它传统算法进行了考察,如人工神经网络中的径向基网络等,并且进行广泛的实验,用多示例学习界国际通行的基准数据对新算法的结果进行考察,取得了较好的结果。 |
English Abstract: | As between the traditional supervised learning and unsupervised learning, multiple-instance learning considers classifying some bags which respectively consist of several instances. However, in multiple-instance learning, rather than giving each instance a label, only bags are labeled by a teacher as being overall positive or negative. Thus multiple instance-learning is much more difficult than the traditional supervised learning. This paper presents a new algorithm based on k nearest neighbor to approach the above multiple-instance problem. Experiments on the Drug activity prediction benchmark data show that the algorithm is ranked among the best ones. Meanwhile, considering the characteristic of Musky Data, we can see that not all the 166 features of data contribute the same importance to the classification. Actually on the contrast, some features have little to do with the classification or even do nothing with it. Thus I convert the coordinates according to the importance of each feature to see whether the algorithm can get better result through feature selection. Furthermore, this paper also do some experiment based on some artificial neural net. |
Language: | 中文
|
Content Type: | 学位论文
|
URI: | http://ir.iscas.ac.cn/handle/311060/5844
|
Appears in Collections: | 中科院软件所
|
File Name/ File Size |
Content Type |
Version |
Access |
License |
|
10001_200428015006001蒋剑_paper.pdf(561KB) | -- | -- | 限制开放 | -- | 联系获取全文 |
|
Recommended Citation: |
蒋剑. 多示例学习算法研究与应用[D]. 软件研究所. 中国科学院软件研究所. 2007-06-01.
|
|
|