中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
数字笔迹的结构分析与识别
作者: 敖翔
答辩日期: 2007-01-15
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 数字笔迹 ; 结构分析 ; 多方向文本行 ; 图文分离 ; 流程图 ; 多通道纠错 ; 智能白板
其他题名: Analyzing and Recognizing the Structures of Digital Ink
摘要: 数字笔迹,是指由笔输入设备产生的带有空间、时间和压力等丰富属性的在线笔画数据,它随着基于“纸笔”隐喻的笔式用户界面的兴起而诞生,并得到了广泛的关注、研究和应用。数字笔迹是一种自然、快捷的信息采集和利用的方式。在使用数字笔迹的笔式用户界面中,用户既能书写模拟物理纸笔效果的笔迹,又能让计算机理解识别笔迹,高效地存储与传输笔迹,快速地检索笔迹,获得物理纸笔没有的便捷。作为重要的使能技术,数字笔迹的计算技术为笔式用户界面的自然性和高效性提供了保障。 在各类数字笔迹计算技术中,笔迹的结构分析技术非常重要,正是它赋予了笔式用户界面智能,让计算机能理解用户输入的笔迹,获得笔迹的结构化表达,为笔迹的结构化编辑和向规范数据的转换打下基础。本文研究正是基于这样的思路而开展的。本文首先概述了数字笔迹产生的背景、数字笔迹的构成和笔迹计算技术,特别分析了数字笔迹结构和及其分析与识别的相关研究。之后,本文对数字笔迹的多方向行提取、图文分离、流程图结构理解这三个问题进行了讨论,对手写文字笔迹识别错误的多通道纠错方法进行了分析,最后研发了一个笔式智能白板会议系统。本文研究内容和创新点如下: 1. 数字笔迹的多方向文本行提取方法 文本行是一种重要的文本笔迹结构,然而由于笔迹内在的不规范性,多方向的文本行结构并不易提取。本文提出了一个基于视觉感知的多方向数字笔迹文本行提取方法。该方法采取自底向上的策略,首先将在空间和时间上邻近的笔画组成笔画块,然后基于这些笔画块建立链接模型,用来评估潜在的行排列结构,最后采用分支限界搜索找出最优行排列。实验表明,该方法能有效地提取数字笔迹的多方向文本行结构,并适用于弯曲文本行的提取。 2. 数字笔迹的图文分离方法 人们书写的笔迹经常同时包含文字和图形。本文介绍了一种数字笔迹的图文分离方法。该方法认为,图形笔迹是不具备合理文本排列的笔迹。因此该方法首先利用笔迹的文本分析技术,将笔迹文本结构化,然后以笔画块为单位,提取该笔画块自身的特征和其周围的文本排列特征,再利用SVM分类器,将每个笔画块分类为文本或者图形。与另外三种图文分离算法的对比评估表明,该方法的分类效果较好,适用于非特定领域数字笔迹的图文分离。 3. 数字笔迹中的流程图结构理解方法 流程图具有丰富的空间信息结构。本文提出了一种数字笔迹的流程图结构理解方法。该方法首先利用单个笔画及其空间时间的上下文特征,采用SVM分类器将流程图中的文字笔画和图形笔画区分开,然后利用图搜索算法和顶点染色算法,从图形笔画中找出流程图的“容器——连接符”结构,最后将文字笔画填充入此结构,从而完成流程图结构理解。本文还提出了利用笔手势纠正流程图识别错误的方法,以及基于结构化的流程图的结构化编辑操作。评估表明本文算法能有效提取笔迹的流程图结构,其结构化编辑自然高效。 4. 连续手写汉字识别的多通道纠错方法 在采用手写识别的系统中,用户满意度不但由识别率决定,而且还受识别错误纠正过程影响。本文提出了一种基于语音复述的多通道连续手写识别错误纠正方法。该方法不但可以纠正字符识别错,还可以纠正字符提取错。该方法的核心是一个多通道融合算法。该算法通过利用用户语音约束最优手写识别结果的搜索,从而达到纠错目的。评估表明,该融合算法能有效纠正上述两类错误;与另外两种手写识别错误纠正方法相比,本文提出的方法具有更高的纠错效率。 5. 笔式智能白板会议系统 白板是一种在许多场合使用的信息交流的工具,本文介绍了一个基于电子白板的笔式智能白板会议系统。该白板系统面向办公人群,用于非正式的会议交流。用户能通过笔式交互,准备白板提纲,进行会议。当会议结束时,用户能将书写的笔迹整理为正式的会议记录。该白板会议系统基于笔者主导开发的笔迹计算工具包(InkLib),使用了大量笔迹计算技术,保证了自然高效的白板交互过程以及便捷的白板笔迹整理过程。
英文摘要: Digital ink refers to the online strokes inputted to computer, which generated by pen input devices. Digital ink is the major data type in pen-based user interfaces. Because of the rising of pen-based user interfaces, great effort is made to study and apply digital ink. In pen-based user interfaces with digital ink, users’ handwritings can have rendering effects, which perfectly mimic that of the handwritings drawn by physical pen and paper. Users’ handwriting can also be intelligently recognized, efficiently stored, rapidly transferred and conveniently retrieved by computers. In a word, digital ink is much more powerful than its physical counterpart. As the essential enabling technologies, digital ink and its computing technologies make pen-based user interfaces natural and effective. Among varous ink computing technologies, ink structure understanding is a very important one. It is no other than the ink structure understanding technology that understands user’s pen inputs and analyzes their structures. Thus, the thesis is focused on analyzing and recognizing digital ink’s structures. The thesis first introduces the background of digital ink. Then it introduces the ink computing technologies, especially the related work on ink structure understanding. Then, the thesis discusses three ink structuring understanding problems: extracting multi-oriented ink text lines, separating text and graph in ink and extracting the structure of hand-drawn diagram. The thesis also introduces a multimodal error correction technique, which enable users to correct the errors in continuous handwriting recognition by speech. Finally, the thesis introduces a pen-based whiteboard system, which utilizes many ink structure understanding technologies to achieve its naturalness and effectiveness in usage. The main contributions of the thesis are: 1. A method to extract multi-oriented text lines of digital ink. Text line is an important textual ink structure. Because the informality of the ink, multi-oriented text lines are not easy to extract. This thesis proposes a method to perceptually extract multi-oriented text lines of digital ink. The method first groups strokes, which are adjacent temporally and spatially, into blocks. Then it builds a link model based on the blocks to evaluate all potential text line layouts. Finally, it finds out the best text line layout by a branch-bound search. The evaluation shows the method is effective. Moreover, the method can also be well applied to extract curved text lines. 2. A method to separate text and graph in digital ink Users’ handwritings are often mixed with graphical and textual contents. This thesis introduces a method to separate the graphical strokes and textual strokes in digital ink. As the graphical strokes cannot form reasonable text layouts, the method first extracts the text layout of ink. Then it extract every stroke block’s features and the features of the text layout formed by the strokes of the block and the strokes surrounding the it. Finally, it uses a SVM classifier to classify every stroke block as text or graph. Evaluation shows, compared with other three text/graph separation method, the method is the best. It is suitable for ink’s text/graph separation for general purpose. 3. A method to extract the structure of hand-drawn diagram Hand-drawn diagram has abundant information. This thesis introduces a method to extract the structure of hand-drawn diagram. The method first separates the textual and graphical strokes in the diagram by the features of the single stroke and its context. Then, it find out the “container - connector” structure from the graphical strokes by using a graphs search algorithm and a vertex coloring algorithm. Finally, the textual strokes are put into the “container - connector” structure. The thesis also introduces how to recover the extraction errors by pen gestures and how to manipulate the diagram structurally. Evaluation shows the extraction method is effective and the structured manipulation is efficient. 4. Multimodal error correction of continuous Chinese handwriting recognition In recognition-based interfaces, users’ satisfaction is not only determined by recognition rate, but also influenced by the recognition error recovery processes. This thesis introduces a multimodal error correction technique, which allows users to correct both character extraction errors and character recognition errors of continuous Chinese handwriting. The key of the technique is a multimodal fusion algorithm, which utilizes users’ speech as constraints for the search of the best handwriting recognition results. Evaluation shows that error correction technique is effective. Compared with other two error correction techniques, the technique is more efficient. 5. Pen-based electronic whiteboard system Whiteboard is a useful tool for idea capturing and exchange. This thesis introduces pen-based electronic whiteboard system. The system is mainly for office use. A user can prepare the outline of a meeting by pen-based interaction. Also, the whole process of the meeting is pen-supported. After the meeting, the user can tidy up the ink, which comes from the discussion during the meeting. The whiteboard system is based on a ink computing toolkit (InkLib), which is mainly developed by the author of the thesis.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/7556
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200318015003110敖翔_paper.pdf(4946KB)----限制开放-- 联系获取全文

Recommended Citation:
敖翔. 数字笔迹的结构分析与识别[D]. 软件研究所. 中国科学院软件研究所. 2007-01-15.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[敖翔]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[敖翔]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace