中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
基于词汇关系的个性化拼音输入法研究与实现
作者: 张玮
答辩日期: 2007-06-07
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 拼音输入法 ; 统计语言模型 ; 词汇搭配 ; 主题预测 ; 个性化
其他题名: Study and Implementation of the Personalized Pinyin Input Method based on Word Relations
摘要: 汉字输入技术是中文信息处理领域中特有的一项基础性关键技术。虽然目前有一些不依赖键盘的汉字输入产品和不少基于汉字字形的输入法,但是毋庸置疑拼音输入法是互联网用户最常用的汉字输入工具。 在互联网高速发展的今天,尤其在Web2.0提出以用户为中心的思想后,用户参与互联网的热情在不断增加,随之对中文输入效率的要求也在不断提高。为了进一步改善拼音输入法的性能,在分析目前拼音输入法现状的基础上,我们从以下几个方面开展了研究工作: 实现了一个基于词汇搭配的拼音输入法。针对统计语言模型对词间长程依赖关系描述能力的不足,我们提出利用词汇搭配来获取远距离的词间关系以提高拼音输入法的效率。 实现了基于主题预测的拼音输入法。从词汇本身存在主题特性的角度出发,在输入法系统中集成分类引擎,利用用户输入的历史对当前输入信息的主题进行判断,利用主题信息预测用户的后续输入以提高拼音输入法的性能。 研究用户的个性化因素(用户输入历史、IE浏览历史)对拼音输入法系统性能的改善。一方面通过对用户输入的历史信息进行在线学习,发现用户特有的词汇和词汇搭配关系,实时用于输入法系统。另一方面定时挖掘用户的IE浏览历史,利用客户端数据建立个性化语言模型,再和原有的3-gram模型结合后共同用于输入法系统。
英文摘要: Chinese input method is a fundamental key technology in the Chinese informa-tion processing field. Although there are some products that do not rely on the key-board and some products based on the shape of Chinese characters, undoubtedly Pin-yin input method is the most commonly used Chinese input method by Internet users. With the development and popularization of Internet, especially with the user-centered idea in Web 2.0 era, more and more Chinese users passionately partici-pate in the Internet. So the importance of Chinese input method’s efficiency becomes more obvious than ever before. In order to further enhance the performance of Chi-nese Pinyin input method, the thesis analyses the current status of Pinyin input method, and then presents some methods to promote its performance in three aspects. First, we construct a collocation based Pinyin input method. Because of the lack of long-term dependence in statistical language model, the thesis uses collocation to improve the performance of our pinyin input method. Second, we implement a Pinyin input method based on topic prediction. From the view of word’s topic characteristics, we embed a classification engine in the Pinyin input method and use the input history to judge the current topic, and then we increase the weights of the candidate words which related to the current topic. The experiments demonstrate that the topic information contributes to the efficiency of Chinese input method. Third, in the thesis, we study the impact of personalization on pinyin input sys-tem’s performance. On one hand, through on-line learning we discover the special words and collocations which belong to the user from user’s input history; On the other hand, we dig out the browse history of IE timely, and then use the client data to train the personalized language model which is integrated with the original 3-gram model.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/5776
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200428015029066张玮_paper.pdf(718KB)----限制开放-- 联系获取全文

Recommended Citation:
张玮. 基于词汇关系的个性化拼音输入法研究与实现[D]. 软件研究所. 中国科学院软件研究所. 2007-06-07.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[张玮]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[张玮]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace