学科主题: | 固体力学
|
题名: | Smoothing LDA Model for Text Categorization |
作者: | Li Wenbo
; Le Sun
; Yuanyong Feng
; Dakun Zhang
|
会议文集: | Lecture Notes in Computer Science
|
会议名称: | 待定
|
会议日期: | 39766
|
出版日期: | 2008
|
会议地点: | Harbin,China
|
关键词: | Text Categorization
; Latent Dirichlet Allocation
; Smoothing
; Graphical Model
|
出版者: | 科学出版社
|
出版地: | 北京
|
收录类别: | EI,ISTP
|
ISSN: | 1234-5678
|
摘要: | Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of
LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in
balanced and unbalanced corpora. |
语种: | 英语
|
内容类型: | 会议论文
|
URI标识: | http://ir.iscas.ac.cn/handle/311060/808
|
Appears in Collections: | 基础软件国家工程研究中心_会议论文
|
File Name/ File Size |
Content Type |
Version |
Access |
License |
|
lwb-conf-01.pdf(389KB) | -- | -- | 限制开放 | -- | 联系获取全文 |
|
Recommended Citation: |
Li Wenbo,Le Sun,Yuanyong Feng,et al. Smoothing LDA Model for Text Categorization[C]. 见:待定. Harbin,China. 39766.
|
|
|