中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 早期
题名:
矢量笔迹混排文本的分割与识别方法研究
作者: 张堃
答辩日期: 2008-06-02
导师: 张习文
专业: 计算机软件与理论
授予单位: 中国科学院研究生院
授予地点: 中国科学院软件研究所
学位: 硕士
关键词: 矢量笔迹文本 ; 笔迹分割 ; 孤立单字识别 ; 连续单字识别 ; 可视化 ; 人机交互
其他题名: Research on Methods of Segmentation and Recognition toward Ink Mixed Text Document
摘要: 矢量笔迹是通过数码笔等计算机笔输入设备采集的,由笔划组成。笔划包含时序采样点,采样点具有坐标、时间和压力等。中文矢量笔迹文本具有复杂的组成单字,例如类型多样、间距较小。结构化和符号化是智能处理中文矢量笔迹文本的基础,因而,本文针对分割和识别技术分别展开了深入研究,具体内容如下:(1)针对混排中文矢量笔迹文本中单字复杂性,提出了迭代提取方法;(2)针对分割结果中元素重叠性和降低用户查错负担,提出了自适应可视化,以及相应的交互校正方法;(3)针对混排文本整体识别问题,利用多种特征进行组合分类,对比了多种分类器,采用了基于支持向量机的分类方法,可以对包括汉字、英文单词、英文字母、数字和标点符号在内的语言详细类别进行自动判断;(4)针对孤立单字识别,通过构建汉字部首组成信息库,提出了基于组成和整体一致性原则的识别后处理方法;(5)基于词汇连续识别结果,通过机械字典构建了利用词库信息的连续识别后处理方法,并在此基础上实现了可视化表达和基于上下文的交互校正方法;(6)设计和开发了原型系统,对若干数据进行了深入测试和评估。
英文摘要: Digital ink can be captured by computer input devices, such as Anoto pen and paper and Tablet PC. It consists of strokes. Sampling points in each stroke are ordered in their sampling times. A sampling point contains coordinates, sampling time, and pressure. The digital ink text in Chinese contains characters with complex structures, multiple languages, and smaller gaps. The digital ink text in Chinese needs structurization and symbolization for advanced utilization. Thus, the thesis focuses on segmentation and recognition of digital ink texts in Chinese, more details are as follows: 1. Ink characters are extracted with multiple steps from digital ink texts in Chinese since they are complex. The text can contain Chinese and English. 2. Components in segmented digital ink texts in Chinese are adaptively visualized because some of them are overlapped, and which can also reduce users’ correcting burdens. Wrongly extracted components are interactively corrected based on visualized results. 3. Ink characters are classified as detailed recognition types using a support vector machine, and many features are used. 4. Isolated ink characters are recognized based on their components and wholes. 5. Ink characters are continuously recognized based on words and word pairs, and then the recognized results are visualized. The wrongly recognized sentences, words and characters are corrected. 6. A software prototype is developed, and many digital ink texts in Chinese are segmented and recognized. The processed results are evaluated in detail.
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/6844
Appears in Collections:中科院软件所图书馆_早期

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200528015029067张堃_paper.pdf(1965KB)----限制开放-- 联系获取全文

Recommended Citation:
张堃. 矢量笔迹混排文本的分割与识别方法研究[D]. 中国科学院软件研究所. 中国科学院研究生院. 2008-06-02.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[张堃]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[张堃]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace