ISCAS OpenIR
A three-phase approach to document clustering based on topic significance degree
Ma, Yinglong (1); Wang, Yao (1); Jin, Beihong (2); Ma, Y.(yinglongma@gmail.com)
2014
SourceExpert Systems with Applications
ISSN9574174
Volume41Issue:18Pages:8203-8210
English AbstractTopic model can project documents into a topic space which facilitates effective document clustering. Selecting a good topic model and improving clustering performance are two highly correlated problems for topic based document clustering. In this paper, we propose a three-phase approach to topic based document clustering. In the first phase, we determine the best topic model and present a formal concept about significance degree of topics and some topic selection criteria, through which we can find the best number of the most suitable topics from the original topic model discovered by LDA. Then, we choose the initial clustering centers by using the k-means++ algorithm. In the third phase, we take the obtained initial clustering centers and use the k-means algorithm for document clustering. Three clustering solutions based on the three phase approach are used for document clustering. The related experiments of the three solutions are made for comparing and illustrating the effectiveness and efficiency of our approach. © 2014 Elsevier Ltd. All rights reserved.; Topic model can project documents into a topic space which facilitates effective document clustering. Selecting a good topic model and improving clustering performance are two highly correlated problems for topic based document clustering. In this paper, we propose a three-phase approach to topic based document clustering. In the first phase, we determine the best topic model and present a formal concept about significance degree of topics and some topic selection criteria, through which we can find the best number of the most suitable topics from the original topic model discovered by LDA. Then, we choose the initial clustering centers by using the k-means++ algorithm. In the third phase, we take the obtained initial clustering centers and use the k-means algorithm for document clustering. Three clustering solutions based on the three phase approach are used for document clustering. The related experiments of the three solutions are made for comparing and illustrating the effectiveness and efficiency of our approach. © 2014 Elsevier Ltd. All rights reserved.
Indexed TypeSCI ; EI
KeywordDocument Clustering Topic Model K-means K-means Plus
Department(1) School of Control and Computer Engineering, North China Electric Power University, Beijing 102206, China; (2) Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Language英语
WOS IDWOS:000342250300015
Citation statistics
Content Type期刊论文
URIhttp://ir.iscas.ac.cn/handle/311060/16790
Collection中国科学院软件研究所
Corresponding AuthorMa, Y.(yinglongma@gmail.com)
Recommended Citation
GB/T 7714
Ma, Yinglong ,Wang, Yao ,Jin, Beihong ,et al. A three-phase approach to document clustering based on topic significance degree[J]. Expert Systems with Applications,2014,41(18):8203-8210.
APA Ma, Yinglong ,Wang, Yao ,Jin, Beihong ,&Ma, Y..(2014).A three-phase approach to document clustering based on topic significance degree.Expert Systems with Applications,41(18),8203-8210.
MLA Ma, Yinglong ,et al."A three-phase approach to document clustering based on topic significance degree".Expert Systems with Applications 41.18(2014):8203-8210.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Ma, Yinglong (1)]'s Articles
[Wang, Yao (1)]'s Articles
[Jin, Beihong (2)]'s Articles
Baidu academic
Similar articles in Baidu academic
[Ma, Yinglong (1)]'s Articles
[Wang, Yao (1)]'s Articles
[Jin, Beihong (2)]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Ma, Yinglong (1)]'s Articles
[Wang, Yao (1)]'s Articles
[Jin, Beihong (2)]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.