中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Title:
tibetanword segmentation as syllable tagging using conditional random field
Author: Liu Huidan ; Nuo Minghua ; Ma Longlong ; Wu Jian ; He Yeping
Keyword: Computational linguistics ; Random processes
Source: PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation
Issued Date: 2011
Pages: 168-177
Indexed Type: EI
Department: (1) Institute of Software Chinese Academy of Sciences No.4 South Fourth Street Zhong Guan Cun Haidian District Beijing 100190 China; (2) Graduate University of the Chinese Academy of Sciences No.80 Zhongguancun East Road Haidian District Beijing 100190 China
Abstract: In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.
English Abstract: In this paper, we proposed a novel approach for Tibetan word segmentation using the conditional random field. We reformulate the segmentation as a syllable tagging problem. The approach labels each syllable with a word-internal position tag, and combines syllable(s) into words according to their tags. As there is no public available Tibetan word segmentation corpus, the training corpus is generated by another segmenter which has an F-score of 96.94% on the test set. Two feature template sets namely TMPT-6 and TMPT-10 are used and compared, and the result shows that the former is better. Experiments also show that larger training set improves the performance significantly. Trained on a set of 131,903 sentences, the segmenter achieves an F-score of 95.12% on the test set of 1,000 sentences. © 2011 by Huidan Liu, Minghua Nuo, Longlong Ma, Jian Wu, and Yeping He.
Language: 英语
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/16170
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
Liu Huidan,Nuo Minghua,Ma Longlong,et al. tibetanword segmentation as syllable tagging using conditional random field[J]. PACLIC 25 - Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation,2011-01-01:168-177.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Liu Huidan]'s Articles
[Nuo Minghua]'s Articles
[Ma Longlong]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Liu Huidan]‘s Articles
[Nuo Minghua]‘s Articles
[Ma Longlong]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace