Subject: | 固体力学
|
Title: | Smoothing LDA Model for Text Categorization |
Author: | Li Wenbo
; Le Sun
; Yuanyong Feng
; Dakun Zhang
|
Source: | Lecture Notes in Computer Science
|
Conference Name: | 待定
|
Conference Date: | 39766
|
Issued Date: | 2008
|
Conference Place: | Harbin,China
|
Keyword: | Text Categorization
; Latent Dirichlet Allocation
; Smoothing
; Graphical Model
|
Publisher: | 科学出版社
|
Publish Place: | 北京
|
Indexed Type: | EI,ISTP
|
ISSN: | 1234-5678
|
Abstract: | Abstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of
LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in
balanced and unbalanced corpora. |
Language: | 英语
|
Content Type: | 会议论文
|
URI: | http://ir.iscas.ac.cn/handle/311060/808
|
Appears in Collections: | 基础软件国家工程研究中心_会议论文
|
File Name/ File Size |
Content Type |
Version |
Access |
License |
|
lwb-conf-01.pdf(389KB) | -- | -- | 限制开放 | -- | 联系获取全文 |
|
Recommended Citation: |
Li Wenbo,Le Sun,Yuanyong Feng,et al. Smoothing LDA Model for Text Categorization[C]. 见:待定. Harbin,China. 39766.
|
|
|