ISCAS OpenIR
a comparative study of tf*idf, lsi and multi-words for text classification
Zhang Wen; Yoshida Taketoshi; Tang Xijin
2011
SourceExpert Systems with Applications
ISSN9574174
Volume38Issue:3Pages:2758-2765
English AbstractOne of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TFIDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing. © 2010 Elsevier Ltd. All rights reserved.
Indexed Typeei
KeywordData Mining Indexing (Of Information) Information Retrieval Natural Language Processing Systems
Department(1) Laboratory for Internet Software Technologies, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; (2) School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Ashahidai, Nomi, Ishikawa 923-1292, Japan; (3) Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
Language英语
WOS IDWOS:000284863200158
Citation statistics
Cited Times:461[WOS]   [WOS Record]     [Related Records in WOS]
Content Type期刊论文
URIhttp://ir.iscas.ac.cn/handle/311060/14095
Collection中国科学院软件研究所
Recommended Citation
GB/T 7714
Zhang Wen,Yoshida Taketoshi,Tang Xijin. a comparative study of tf*idf, lsi and multi-words for text classification[J]. Expert Systems with Applications,2011,38(3):2758-2765.
APA Zhang Wen,Yoshida Taketoshi,&Tang Xijin.(2011).a comparative study of tf*idf, lsi and multi-words for text classification.Expert Systems with Applications,38(3),2758-2765.
MLA Zhang Wen,et al."a comparative study of tf*idf, lsi and multi-words for text classification".Expert Systems with Applications 38.3(2011):2758-2765.
Files in This Item:
File Name/Size DocType Version Access License
A comparative study (493KB) 开放获取--Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Zhang Wen]'s Articles
[Yoshida Taketoshi]'s Articles
[Tang Xijin]'s Articles
Baidu academic
Similar articles in Baidu academic
[Zhang Wen]'s Articles
[Yoshida Taketoshi]'s Articles
[Tang Xijin]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zhang Wen]'s Articles
[Yoshida Taketoshi]'s Articles
[Tang Xijin]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.