Institutional Repository
| tibetanword segmentation as syllable tagging using conditional random field | |
| Liu Huidan; Nuo Minghua; Ma Longlong; Wu Jian; He Yeping | |
| 2011 | |
| Source | PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
![]() |
| Pages | 168-177 |
| English Abstract | In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.; In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He. |
| Indexed Type | EI |
| Keyword | Computational Linguistics Random Processes |
| Department | (1) Institute of Software Chinese Academy of Sciences No.4 South Fourth Street Zhong Guan Cun Haidian District Beijing 100190 China; (2) Graduate University of the Chinese Academy of Sciences No.80 Zhongguancun East Road Haidian District Beijing 100190 China |
| Language | 英语 |
| Content Type | 期刊论文 |
| URI | http://ir.iscas.ac.cn/handle/311060/16170 |
| Collection | 中国科学院软件研究所 |
| Recommended Citation GB/T 7714 | Liu Huidan,Nuo Minghua,Ma Longlong,et al. tibetanword segmentation as syllable tagging using conditional random field[J]. PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,2011:168-177. |
| APA | Liu Huidan,Nuo Minghua,Ma Longlong,Wu Jian,&He Yeping.(2011).tibetanword segmentation as syllable tagging using conditional random field.PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,168-177. |
| MLA | Liu Huidan,et al."tibetanword segmentation as syllable tagging using conditional random field".PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (2011):168-177. |
| Files in This Item: | There are no files associated with this item. | |||||
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment