中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Subject: Computer Science
Title:
SegT:一个实用的藏文分词系统
Alternative Title: segt:a practical tibetan word segmentation system
Author: 刘汇丹 ; 诺明花 ; 赵维纳 ; 吴健 ; 贺也平
Keyword: Tibetan word segmentation ; case-auxiliary words ; critical word detection ; word frequency statistics ; Tibetan information processing ; Chinese information processing
Source: Journal of Chinese Information Processing
Issued Date: 2012
Volume: 26, Issue:1, Pages:97-103
Indexed Type: cscd,cnki,wanfang
Department: 刘汇丹, 中国科学院软件研究所, 北京 100190, 中国. 诺明花, 中国科学院软件研究所, 北京 100190, 中国. 吴健, 中国科学院软件研究所, 北京 100190, 中国. 贺也平, 中国科学院软件研究所, 北京 100190, 中国. 赵维纳, 北京语言大学, 北京 100083, 中国.
Abstract: 在分析现有藏文分词方法的基础上,该文重点研究了藏文分词中的格助词分块、临界词识别、词频统计、交集型歧义检测和消歧等问题并提出了相应的方法。应用这 些方法,设计实现了一个藏文分词系统SegT。该系统采用格助词分块并识别临界词,然后采用最大匹配方法分词,并进行紧缩词识别。系统采用双向切分检测交 集型歧义字段并使用预先统计的词频信息进行消歧。实验结果表明,该文设计的格助词分块和临界词识别方法可以将分词速度提高15%左右,但格助词分块对分词 效果没有明显提高或降低。系统最终分词正确率为96.98%,基本达到了实用的水平。
English Abstract: This paper designs and implements a Tibetan word segmentation system named "SegT".It identifies critical words with a fast algorithm based on the trie structure when it segments each Tibetan sentence to blocks with case-auxiliary words.Then,it identifies abbreviated words when it segments each block to words by maximum matching.Finally,it detects ambiguities by bidirectional segmentation,and solve them by word frequency.Experiments show that it improves the segmenting speed by about 15% after applying the block segmentation method based on case-auxiliary words,but the block segmentation doesnt significantly increase or decrease the precision.The precision of the system reaches 96.98%,which shows that its a practical system.
Language: 中文
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/14696
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:
File Name/ File Size Content Type Version Access License
SegT一个实用的藏文分词系统.pdf(1024KB)----限制开放 联系获取全文

Recommended Citation:
刘汇丹,诺明花,赵维纳,等. SegT:一个实用的藏文分词系统[J]. Journal of Chinese Information Processing,2012-01-01,26(1):97-103.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[刘汇丹]'s Articles
[诺明花]'s Articles
[赵维纳]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[刘汇丹]‘s Articles
[诺明花]‘s Articles
[赵维纳]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace