ISCAS OpenIR
tibetanword segmentation as syllable tagging using conditional random field
Liu Huidan; Nuo Minghua; Ma Longlong; Wu Jian; He Yeping
2011
SourcePACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
Pages168-177
English AbstractIn this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.; In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.
Indexed TypeEI
KeywordComputational Linguistics Random Processes
Department(1) Institute of Software Chinese Academy of Sciences No.4 South Fourth Street Zhong Guan Cun Haidian District Beijing 100190 China; (2) Graduate University of the Chinese Academy of Sciences No.80 Zhongguancun East Road Haidian District Beijing 100190 China
Language英语
Content Type期刊论文
URIhttp://ir.iscas.ac.cn/handle/311060/16170
Collection中国科学院软件研究所
Recommended Citation
GB/T 7714
Liu Huidan,Nuo Minghua,Ma Longlong,et al. tibetanword segmentation as syllable tagging using conditional random field[J]. PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,2011:168-177.
APA Liu Huidan,Nuo Minghua,Ma Longlong,Wu Jian,&He Yeping.(2011).tibetanword segmentation as syllable tagging using conditional random field.PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,168-177.
MLA Liu Huidan,et al."tibetanword segmentation as syllable tagging using conditional random field".PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (2011):168-177.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Liu Huidan]'s Articles
[Nuo Minghua]'s Articles
[Ma Longlong]'s Articles
Baidu academic
Similar articles in Baidu academic
[Liu Huidan]'s Articles
[Nuo Minghua]'s Articles
[Ma Longlong]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Liu Huidan]'s Articles
[Nuo Minghua]'s Articles
[Ma Longlong]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.