ISCAS OpenIR
A three-phase approach to document clustering based on topic significance degree
Ma, Yinglong (1); Wang, Yao (1); Jin, Beihong (2); Ma, Y.(yinglongma@gmail.com)
2014
发表期刊Expert Systems with Applications
ISSN9574174
卷号41期号:18页码:8203-8210
摘要Topic model can project documents into a topic space which facilitates effective document clustering. Selecting a good topic model and improving clustering performance are two highly correlated problems for topic based document clustering. In this paper, we propose a three-phase approach to topic based document clustering. In the first phase, we determine the best topic model and present a formal concept about significance degree of topics and some topic selection criteria, through which we can find the best number of the most suitable topics from the original topic model discovered by LDA. Then, we choose the initial clustering centers by using the k-means++ algorithm. In the third phase, we take the obtained initial clustering centers and use the k-means algorithm for document clustering. Three clustering solutions based on the three phase approach are used for document clustering. The related experiments of the three solutions are made for comparing and illustrating the effectiveness and efficiency of our approach. © 2014 Elsevier Ltd. All rights reserved.; Topic model can project documents into a topic space which facilitates effective document clustering. Selecting a good topic model and improving clustering performance are two highly correlated problems for topic based document clustering. In this paper, we propose a three-phase approach to topic based document clustering. In the first phase, we determine the best topic model and present a formal concept about significance degree of topics and some topic selection criteria, through which we can find the best number of the most suitable topics from the original topic model discovered by LDA. Then, we choose the initial clustering centers by using the k-means++ algorithm. In the third phase, we take the obtained initial clustering centers and use the k-means algorithm for document clustering. Three clustering solutions based on the three phase approach are used for document clustering. The related experiments of the three solutions are made for comparing and illustrating the effectiveness and efficiency of our approach. © 2014 Elsevier Ltd. All rights reserved.
收录类别SCI ; EI
关键词Document Clustering Topic Model K-means K-means Plus
部门归属(1) School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China; (2) Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
语种英语
WOS记录号WOS:000342250300015
引用统计
内容类型期刊论文
URI标识http://ir.iscas.ac.cn/handle/311060/16790
专题中国科学院软件研究所
通讯作者Ma, Y.(yinglongma@gmail.com)
推荐引用方式
GB/T 7714
Ma, Yinglong ,Wang, Yao ,Jin, Beihong ,et al. A three-phase approach to document clustering based on topic significance degree[J]. Expert Systems with Applications,2014,41(18):8203-8210.
APA Ma, Yinglong ,Wang, Yao ,Jin, Beihong ,&Ma, Y..(2014).A three-phase approach to document clustering based on topic significance degree.Expert Systems with Applications,41(18),8203-8210.
MLA Ma, Yinglong ,et al."A three-phase approach to document clustering based on topic significance degree".Expert Systems with Applications 41.18(2014):8203-8210.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Ma, Yinglong (1)]的文章
[Wang, Yao (1)]的文章
[Jin, Beihong (2)]的文章
百度学术
百度学术中相似的文章
[Ma, Yinglong (1)]的文章
[Wang, Yao (1)]的文章
[Jin, Beihong (2)]的文章
必应学术
必应学术中相似的文章
[Ma, Yinglong (1)]的文章
[Wang, Yao (1)]的文章
[Jin, Beihong (2)]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。