ISCAS OpenIR  > 基础软件国家工程研究中心
Smoothing LDA Model for Text Categorization
Li Wenbo; Le Sun; Yuanyong Feng; Dakun Zhang
2008
Conference Name待定
SourceLecture Notes in Computer Science
Pages83-94
Conference Date39766
Conference PlaceHarbin,China
Indexed TypeEI,ISTP
Publish Place北京
Publisher科学出版社
ISSN1234-5678
English AbstractAbstract. Latent Dirichlet Allocation (LDA) is a document level language model. In general, LDA employ the symmetry Dirichlet distribution as prior of the topic-words’ distributions to implement model smoothing. In this paper, we propose a data-driven smoothing strategy in which probability mass is allocated from smoothing-data to latent variables by the intrinsic inference procedure of LDA. In such a way, the arbitrariness of choosing latent variables'priors for the multi-level graphical model is overcome. Following this data-driven strategy,two concrete methods, Laplacian smoothing and Jelinek-Mercer smoothing, are employed to LDA model. Evaluations on different text categorization collections show data-driven smoothing can significantly improve the performance in balanced and unbalanced corpora.
KeywordText Categorization Latent Dirichlet Allocation Smoothing Graphical Model
Subject固体力学
Language英语
Content Type会议论文
URIhttp://ir.iscas.ac.cn/handle/311060/808
Collection基础软件国家工程研究中心
Recommended Citation
GB/T 7714
Li Wenbo,Le Sun,Yuanyong Feng,et al. Smoothing LDA Model for Text Categorization[C]. 北京:科学出版社,2008:83-94.
Files in This Item:
File Name/Size DocType Version Access License
lwb-conf-01.pdf(389KB) 开放获取--Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li Wenbo]'s Articles
[Le Sun]'s Articles
[Yuanyong Feng]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li Wenbo]'s Articles
[Le Sun]'s Articles
[Yuanyong Feng]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li Wenbo]'s Articles
[Le Sun]'s Articles
[Yuanyong Feng]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.