automatic acquisition of chinese-tibetan multi-word equivalent pair from bilingual corpora

ISCAS OpenIR

	automatic acquisition of chinese-tibetan multi-word equivalent pair from bilingual corpora
	Nuo Minghua; Liu Huidan; Ma Longlong; Wu Jian; Ding Zhiming
	2011
会议名称	2011 International Conference on Asian Language Processing, IALP 2011
会议录名称	Proceedings - 2011 International Conference on Asian Language Processing, IALP 2011
页码	177-180
会议日期	November 1
会议地点	Penang, Malaysia
收录类别	EI
ISBN	9780769545547
部门归属	(1) Institute of Software Chinese Academy of Sciences Beijing China
摘要	This paper aims to construct Chinese-Tibetan multi-word equivalent pair dictionary for Chinese-Tibetan computer-aided translation system. Since Tibetan is a morphologically rich language, we propose two-phase framework to automatically extract multi-word equivalent pairs. First, extract Chinese Multi-word Units (MWUs). In this phase, we propose CBEM model to partition a Chinese sentence into MWUs using two measures of collocation and binding degree. Second, get Tibetan translations of the extracted Chinese MWUs. In the second phase, we propose TSIM model to focus on extracting 1-to-n bilingual MWUs. Preliminary experimental results show that the mixed method combining CBEM model with TSIM model is effective. © 2011 IEEE.; This paper aims to construct Chinese-Tibetan multi-word equivalent pair dictionary for Chinese-Tibetan computer-aided translation system. Since Tibetan is a morphologically rich language, we propose two-phase framework to automatically extract multi-word equivalent pairs. First, extract Chinese Multi-word Units (MWUs). In this phase, we propose CBEM model to partition a Chinese sentence into MWUs using two measures of collocation and binding degree. Second, get Tibetan translations of the extracted Chinese MWUs. In the second phase, we propose TSIM model to focus on extracting 1-to-n bilingual MWUs. Preliminary experimental results show that the mixed method combining CBEM model with TSIM model is effective. © 2011 IEEE.
关键词	Natural Language Processing Systems
语种	英语
内容类型	会议论文
URI标识	http://ir.iscas.ac.cn/handle/311060/16257
专题	中国科学院软件研究所
推荐引用方式 GB/T 7714	Nuo Minghua,Liu Huidan,Ma Longlong,et al. automatic acquisition of chinese-tibetan multi-word equivalent pair from bilingual corpora[C],2011:177-180.