Institutional Repository
| automatic acquisition of chinese-tibetan multi-word equivalent pair from bilingual corpora | |
| Nuo Minghua; Liu Huidan; Ma Longlong; Wu Jian; Ding Zhiming | |
| 2011 | |
| Conference Name | 2011 International Conference on Asian Language Processing, IALP 2011 |
| Source | Proceedings - 2011 International Conference on Asian Language Processing, IALP 2011 |
| Pages | 177-180 |
| Conference Date | November 1 |
| Conference Place | Penang, Malaysia |
| Indexed Type | EI |
| ISBN | 9780769545547 |
| Department | (1) Institute of Software Chinese Academy of Sciences Beijing China |
| English Abstract | This paper aims to construct Chinese-Tibetan multi-word equivalent pair dictionary for Chinese-Tibetan computer-aided translation system. Since Tibetan is a morphologically rich language, we propose two-phase framework to automatically extract multi-word equivalent pairs. First, extract Chinese Multi-word Units (MWUs). In this phase, we propose CBEM model to partition a Chinese sentence into MWUs using two measures of collocation and binding degree. Second, get Tibetan translations of the extracted Chinese MWUs. In the second phase, we propose TSIM model to focus on extracting 1-to-n bilingual MWUs. Preliminary experimental results show that the mixed method combining CBEM model with TSIM model is effective. © 2011 IEEE.; This paper aims to construct Chinese-Tibetan multi-word equivalent pair dictionary for Chinese-Tibetan computer-aided translation system. Since Tibetan is a morphologically rich language, we propose two-phase framework to automatically extract multi-word equivalent pairs. First, extract Chinese Multi-word Units (MWUs). In this phase, we propose CBEM model to partition a Chinese sentence into MWUs using two measures of collocation and binding degree. Second, get Tibetan translations of the extracted Chinese MWUs. In the second phase, we propose TSIM model to focus on extracting 1-to-n bilingual MWUs. Preliminary experimental results show that the mixed method combining CBEM model with TSIM model is effective. © 2011 IEEE. |
| Keyword | Natural Language Processing Systems |
| Language | 英语 |
| Content Type | 会议论文 |
| URI | http://ir.iscas.ac.cn/handle/311060/16257 |
| Collection | 中国科学院软件研究所 |
| Recommended Citation GB/T 7714 | Nuo Minghua,Liu Huidan,Ma Longlong,et al. automatic acquisition of chinese-tibetan multi-word equivalent pair from bilingual corpora[C],2011:177-180. |
| Files in This Item: | There are no files associated with this item. | |||||
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment