Institutional Repository
| research of chinese text classification methods based on semantic vector and semantic similarity | |
| Song Xin; Huang Jia; Zhou Jing-Min; Chen Xi | |
| 2009 | |
| 会议名称 | 2009 International Forum on Computer Science-Technology and Applications, IFCSTA 2009 |
| 会议录名称 | IFCSTA 2009 Proceedings - 2009 International Forum on Computer Science-Technology and Applications |
| 页码 | 187-190 |
| 会议日期 | 40879 |
| 会议地点 | Chongqing, China |
| 收录类别 | ei |
| 出版地 | United States |
| ISBN | 9780769539300 |
| 部门归属 | (1) State Key Laboratory of Software Development Environment, Beihang University, 100191, Beijing, China; (2) Institute of Software Chinese Academy of Sciences, 100190, Beijing, China |
| 摘要 | To overcome the limitations of traditional text classification approaches based on bag-of-words representation and to effectively incorporate linguistic knowledge and conceptual index into text vector space model, based on two thesaurus HowNet and Tongyici Cilin(hereinafter referred to Cilin), we use semantic vector to describe a document instead of traditional keywords vector, which is based on merging words with high similarity and using a concept to describe the semantic feature rather than a series of words. It not only reduces feature dimension but also adds semantic information to the vector. We also use sentence (document) similarity based on simple vector distance to classify the text and three groups of experiments are made respectively. The results show that the accuracy rates are generally improved along with semantic treatment, which indicates that semantic mining is very important and necessary to text classification. © 2009 IEEE. |
| 关键词 | Computer Science Information Retrieval Systems Knowledge Representation Semantics Vector Spaces Vectors |
| 主办者 | IITAA - International Information Technology; and Applications Association |
| 语种 | 英语 |
| 内容类型 | 会议论文 |
| URI标识 | http://ir.iscas.ac.cn/handle/311060/8434 |
| 专题 | 2009年期刊/会议论文 |
| 推荐引用方式 GB/T 7714 | Song Xin,Huang Jia,Zhou Jing-Min,et al. research of chinese text classification methods based on semantic vector and semantic similarity[C]. United States,2009:187-190. |
| 条目包含的文件 | 条目无相关文件。 | |||||
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论