ISCAS OpenIR
tibetanword segmentation as syllable tagging using conditional random field
Liu Huidan; Nuo Minghua; Ma Longlong; Wu Jian; He Yeping
2011
发表期刊PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
页码168-177
摘要In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.; In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.
收录类别EI
关键词Computational Linguistics Random Processes
部门归属(1) Institute of Software Chinese Academy of Sciences No.4 South Fourth Street Zhong Guan Cun Haidian District Beijing 100190 China; (2) Graduate University of the Chinese Academy of Sciences No.80 Zhongguancun East Road Haidian District Beijing 100190 China
语种英语
内容类型期刊论文
URI标识http://ir.iscas.ac.cn/handle/311060/16170
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Liu Huidan,Nuo Minghua,Ma Longlong,et al. tibetanword segmentation as syllable tagging using conditional random field[J]. PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,2011:168-177.
APA Liu Huidan,Nuo Minghua,Ma Longlong,Wu Jian,&He Yeping.(2011).tibetanword segmentation as syllable tagging using conditional random field.PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,168-177.
MLA Liu Huidan,et al."tibetanword segmentation as syllable tagging using conditional random field".PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (2011):168-177.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Liu Huidan]的文章
[Nuo Minghua]的文章
[Ma Longlong]的文章
百度学术
百度学术中相似的文章
[Liu Huidan]的文章
[Nuo Minghua]的文章
[Ma Longlong]的文章
必应学术
必应学术中相似的文章
[Liu Huidan]的文章
[Nuo Minghua]的文章
[Ma Longlong]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。