ISCAS OpenIR
word combination kernel for text categorization
Zhang Lujiang; Hu Xiaohui; Qin Shiyin
2012
发表期刊Journal of Digital Information Management
ISSN0972-7272
卷号10期号:3页码:202-211
摘要We proposed a novel kernel for text categorization. This kernel is an inner product in the feature space generated by all word combinations of specified length. A word combination is a collection of different words co-occurring in the same sentence. The word combination of length k is weighted by the k-th root of the product of the inverse document frequencies (IDF) of its words. A computationally simple and efficient algorithm was proposed to calculate this kernel. By restricting the words of a word combination to the same sentence and considering multi-word combinations, the word combination features can capture similarity at a more specific level than single words. By discarding word order, the word combination features are more compatible with the flexibility of natural language and the dimensionality this kernel can be reduced significantly compared to the word-sequence kernel. We conducted a series of experiments on the Reuters-21578 dataset and 20 Newsgroups dataset. This kernel consistently achieves better performance than the classical word kernel and word-sequence kernel on the two datasets. We also assessed the impact of word combination length on performance and compared the computing efficiency of this kernel to those of the word kernel and word-sequence kernel.; We proposed a novel kernel for text categorization. This kernel is an inner product in the feature space generated by all word combinations of specified length. A word combination is a collection of different words co-occurring in the same sentence. The word combination of length k is weighted by the k-th root of the product of the inverse document frequencies (IDF) of its words. A computationally simple and efficient algorithm was proposed to calculate this kernel. By restricting the words of a word combination to the same sentence and considering multi-word combinations, the word combination features can capture similarity at a more specific level than single words. By discarding word order, the word combination features are more compatible with the flexibility of natural language and the dimensionality this kernel can be reduced significantly compared to the word-sequence kernel. We conducted a series of experiments on the Reuters-21578 dataset and 20 Newsgroups dataset. This kernel consistently achieves better performance than the classical word kernel and word-sequence kernel on the two datasets. We also assessed the impact of word combination length on performance and compared the computing efficiency of this kernel to those of the word kernel and word-sequence kernel.
收录类别EI
关键词Algorithms Learning Systems Support Vector Machines
部门归属(1) School of Automation Science and Electrical Engineering Beijing University of Aeronautics and Astronautics Beijing 100191 China; (2) Institute of Software Chinese Academy of Sciences Beijing 100190 China
语种英语
内容类型期刊论文
URI标识http://ir.iscas.ac.cn/handle/311060/15034
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Zhang Lujiang,Hu Xiaohui,Qin Shiyin. word combination kernel for text categorization[J]. Journal of Digital Information Management,2012,10(3):202-211.
APA Zhang Lujiang,Hu Xiaohui,&Qin Shiyin.(2012).word combination kernel for text categorization.Journal of Digital Information Management,10(3),202-211.
MLA Zhang Lujiang,et al."word combination kernel for text categorization".Journal of Digital Information Management 10.3(2012):202-211.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang Lujiang]的文章
[Hu Xiaohui]的文章
[Qin Shiyin]的文章
百度学术
百度学术中相似的文章
[Zhang Lujiang]的文章
[Hu Xiaohui]的文章
[Qin Shiyin]的文章
必应学术
必应学术中相似的文章
[Zhang Lujiang]的文章
[Hu Xiaohui]的文章
[Qin Shiyin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。