ISCAS OpenIR  > 早期
矢量笔迹混排文本的分割与识别方法研究
Alternative TitleResearch on Methods of Segmentation and Recognition toward Ink Mixed Text Document
张堃
Major计算机软件与理论
Supervisor张习文
2008-06-02
Degree Grantor中国科学院研究生院
Degree Level硕士
Place of Degree Grantor中国科学院软件研究所
Keyword矢量笔迹文本 笔迹分割 孤立单字识别 连续单字识别 可视化 人机交互
English Abstract矢量笔迹是通过数码笔等计算机笔输入设备采集的,由笔划组成。笔划包含时序采样点,采样点具有坐标、时间和压力等。中文矢量笔迹文本具有复杂的组成单字,例如类型多样、间距较小。结构化和符号化是智能处理中文矢量笔迹文本的基础,因而,本文针对分割和识别技术分别展开了深入研究,具体内容如下:(1)针对混排中文矢量笔迹文本中单字复杂性,提出了迭代提取方法;(2)针对分割结果中元素重叠性和降低用户查错负担,提出了自适应可视化,以及相应的交互校正方法;(3)针对混排文本整体识别问题,利用多种特征进行组合分类,对比了多种分类器,采用了基于支持向量机的分类方法,可以对包括汉字、英文单词、英文字母、数字和标点符号在内的语言详细类别进行自动判断;(4)针对孤立单字识别,通过构建汉字部首组成信息库,提出了基于组成和整体一致性原则的识别后处理方法;(5)基于词汇连续识别结果,通过机械字典构建了利用词库信息的连续识别后处理方法,并在此基础上实现了可视化表达和基于上下文的交互校正方法;(6)设计和开发了原型系统,对若干数据进行了深入测试和评估。
AbstractDigital ink can be captured by computer input devices, such as Anoto pen and paper and Tablet PC. It consists of strokes. Sampling points in each stroke are ordered in their sampling times. A sampling point contains coordinates, sampling time, and pressure. The digital ink text in Chinese contains characters with complex structures, multiple languages, and smaller gaps. The digital ink text in Chinese needs structurization and symbolization for advanced utilization. Thus, the thesis focuses on segmentation and recognition of digital ink texts in Chinese, more details are as follows: 1. Ink characters are extracted with multiple steps from digital ink texts in Chinese since they are complex. The text can contain Chinese and English. 2. Components in segmented digital ink texts in Chinese are adaptively visualized because some of them are overlapped, and which can also reduce users’ correcting burdens. Wrongly extracted components are interactively corrected based on visualized results. 3. Ink characters are classified as detailed recognition types using a support vector machine, and many features are used. 4. Isolated ink characters are recognized based on their components and wholes. 5. Ink characters are continuously recognized based on words and word pairs, and then the recognized results are visualized. The wrongly recognized sentences, words and characters are corrected. 6. A software prototype is developed, and many digital ink texts in Chinese are segmented and recognized. The processed results are evaluated in detail.
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/6844
Collection早期
Recommended Citation
GB/T 7714
张堃. 矢量笔迹混排文本的分割与识别方法研究[D]. 中国科学院软件研究所. 中国科学院研究生院,2008.
Files in This Item:
File Name/Size DocType Version Access License
10001_20052801502906(1965KB) 限制开放--Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[张堃]'s Articles
Baidu academic
Similar articles in Baidu academic
[张堃]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[张堃]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.