Institutional Repository
| Smoothing LDA Model for Text Categorization | |
| Li Wenbo; Le Sun; Yuanyong Feng; Dakun Zhang | |
| 2008 | |
| Conference Name | 待定 |
| Source | Lecture Notes in Computer Science |
| Pages | 83-94 |
| Conference Date | 39766 |
| Conference Place | Harbin,China |
| Indexed Type | EI,ISTP |
| Publish Place | 北京 |
| Publisher | 科学出版社 |
| ISSN | 1234-5678 |
| English Abstract | Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora. |
| Keyword | Text Categorization Latent Dirichlet Allocation Smoothing Graphical Model |
| Subject | 固体力学 |
| Language | 英语 |
| Content Type | 会议论文 |
| URI | http://ir.iscas.ac.cn/handle/311060/808 |
| Collection | 基础软件国家工程研究中心 |
| Recommended Citation GB/T 7714 | Li Wenbo,Le Sun,Yuanyong Feng,et al. Smoothing LDA Model for Text Categorization[C]. 北京:科学出版社,2008:83-94. |
| Files in This Item: | ||||||
| File Name/Size | DocType | Version | Access | License | ||
| lwb-conf-01.pdf(389KB) | 开放获取 | -- | Application Full Text | |||
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment