Institutional Repository
| bibclus: a clustering algorithm of bibliographic networks by message passing on center linkage structure | |
| Xu Xiaoran; Deng Zhi-Hong | |
| 2011 | |
| 会议名称 | 11th IEEE International Conference on Data Mining, ICDM 2011 |
| 会议录名称 | Proceedings - IEEE International Conference on Data Mining, ICDM |
| 页码 | 864-873 |
| 会议日期 | December 11, 2011 - December 14, 2011 |
| 会议地点 | Vancouver, BC, Canada |
| 收录类别 | EI |
| ISSN | 1550-4786 |
| ISBN | 9780769544083 |
| 部门归属 | (1) Key Laboratory of Machine Perception (Ministry of Education) School of Electronics Engineering and Computer Science Peking University Beijing 100871 China; (2) State Key Lab of Computer Science Institute of Software Chinese Academy of Sciences Beijing 100190 China |
| 摘要 | Multi-type objects with multi-type relations are ubiquitous in real-world networks, e.g. bibliographic networks. Such networks are also called heterogeneous information networks. However, the research on clustering for heterogeneous information networks is little. A new algorithm, called NetClus, has been proposed in recent two years. Although NetClus is applied on a heterogeneous information network with a star network schema, considering the relations between center objects and all attribute objects linking to them, it ignores the relations between center objects such as citation relations, which also contain rich information. Hence, we think the star network schema cannot be used to characterize all possible relations without integrating the linkage structure among center objects, which we call the Center Linkage Structure, and there has been no practical way good enough to solve it. In this paper, we present a novel algorithm, BibClus, for clustering heterogeneous objects with center linkage structure by taking a bibliographic information network as an example. In BibClus, we build a probabilistic model of pairwise hidden Markov random field (P-HMRF) to characterize the center linkage structure, and convert it to a factor graph. We further combine EM algorithm with factor graph theory, and design an efficient way based on message passing algorithm to inference marginal probabilities and estimate parameters at each iteration of EM. We also study how factor functions affect clustering performance with different function forms and constraints. For evaluating our proposed method, we have conducted thorough experiments on a real dataset that we had crawled from ACM Digital Library. The experimental results show that BibClus is effective and has a much higher quantity than the recently proposed algorithm, NetClus, in both recall and precision. © 2011 IEEE.; Multi-type objects with multi-type relations are ubiquitous in real-world networks, e.g. bibliographic networks. Such networks are also called heterogeneous information networks. However, the research on clustering for heterogeneous information networks is little. A new algorithm, called NetClus, has been proposed in recent two years. Although NetClus is applied on a heterogeneous information network with a star network schema, considering the relations between center objects and all attribute objects linking to them, it ignores the relations between center objects such as citation relations, which also contain rich information. Hence, we think the star network schema cannot be used to characterize all possible relations without integrating the linkage structure among center objects, which we call the Center Linkage Structure, and there has been no practical way good enough to solve it. In this paper, we present a novel algorithm, BibClus, for clustering heterogeneous objects with center linkage structure by taking a bibliographic information network as an example. In BibClus, we build a probabilistic model of pairwise hidden Markov random field (P-HMRF) to characterize the center linkage structure, and convert it to a factor graph. We further combine EM algorithm with factor graph theory, and design an efficient way based on message passing algorithm to inference marginal probabilities and estimate parameters at each iteration of EM. We also study how factor functions affect clustering performance with different function forms and constraints. For evaluating our proposed method, we have conducted thorough experiments on a real dataset that we had crawled from ACM Digital Library. The experimental results show that BibClus is effective and has a much higher quantity than the recently proposed algorithm, NetClus, in both recall and precision. © 2011 IEEE. |
| 关键词 | Clustering Algorithms Data Mining Digital Libraries Graph Theory Inference Engines Message Passing Stars |
| 主办者 | National Science Foundation (NSF) - Where Discoveries Begin; University of Technology Sydney; Google; Alberta Ingenuity Centre for Machine Learning; IBM Research |
| 语种 | 英语 |
| 内容类型 | 会议论文 |
| URI标识 | http://ir.iscas.ac.cn/handle/311060/16283 |
| 专题 | 中国科学院软件研究所 |
| 推荐引用方式 GB/T 7714 | Xu Xiaoran,Deng Zhi-Hong. bibclus: a clustering algorithm of bibliographic networks by message passing on center linkage structure[C],2011:864-873. |
| 条目包含的文件 | 条目无相关文件。 | |||||
| 个性服务 |
| 推荐该条目 |
| 保存到收藏夹 |
| 查看访问统计 |
| 导出为Endnote文件 |
| 谷歌学术 |
| 谷歌学术中相似的文章 |
| [Xu Xiaoran]的文章 |
| [Deng Zhi-Hong]的文章 |
| 百度学术 |
| 百度学术中相似的文章 |
| [Xu Xiaoran]的文章 |
| [Deng Zhi-Hong]的文章 |
| 必应学术 |
| 必应学术中相似的文章 |
| [Xu Xiaoran]的文章 |
| [Deng Zhi-Hong]的文章 |
| 相关权益政策 |
| 暂无数据 |
| 收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论