ISCAS OpenIR  > 中科院软件所  > 中科院软件所
表格识别系统应用中若干问题的研究
其他题名A Study on Some Problems in The Application of Form Recognition System
卜飞宇
专业计算机应用技术
2004
学位授予单位中国科学院软件研究所
学位博士
学位授予地点中国科学院软件研究所
关键词表格识别 表格与图形鉴别 表格框线去除 图像倾斜检测与校正
摘要表格是一种常见的文档形式,广泛地应用于人们的日常工作和生活之中。随着计算机技术的发展,利用计算机获取、存储和管理数量巨大的表格信,息已越来越成为人们关注的焦点。表格识别系统已开始成为替代人工录入、自动获取表格信息的一种有效工具。针对现有表格识别系统在应用中遇到的一些问题,本文对表格与图形的鉴别、彩色票据图像表格框线的去除、灰度与彩色表格图像的倾斜角度检测等几个问题进行了深入研究,并取得如下一些成果:1、现有系统中,鉴别表格与图形的误判率较高。本文提出了一种根据表格框线和单元信息来区分表格与图形的方法,该方法结合表格的结构特征,提出了作为表格要素的表格框线和表格单元所必须满足的若干约束条件,通过验证每个条件是否得到满足来区分表格与图形。实验表明,该方法能有效地降低对表格与图形的误判率。2、字线交迭严重干扰对字符的切分与识别。以前的基于二值图像的表格框线去除算法,只能在一定程度上排除表格框线对字符识别的干扰。随着计算机运算速度和存贮容量的迅速提高,表格识别系统的扫描输入图像开始采用灰度和彩色图像。本文提出了一种基于彩色图像的表格框线去除算法,由于利用了彩色和灰度信息,能更好的排除表格框线对字符识别的干扰。该方法目前已成功地应用于银行票据识别系统中。3、为解决灰度和彩色票据图像倾斜问题,本文提出了一种根据扫描时产生的黑色边缘来检测扫描图像倾斜角度的方法。该方法根据检测出的四条边缘拟合直线来确定图像倾斜角度。实验表明,该方法具有很快的速度和很高的正确率,且适应于所有白色(浅色)矩形纸张扫描的灰度和彩色图像。目前,该方法已用于彩色银行票据和灰度名片图像的倾刹校正与去除黑边。
其他摘要Form is widely used to collect and distribute data in daily office operations. Using computer to capture, store and manage large volume form document, is becoming more and more important. Instead of manual input, form recognition system is becoming an effective tool to capture form information now. Concerned with some problems in the application of form recognition system, the author has made some research in the following respects: distinguishing tables from graphics; removing form frame line from color financial bill images; detecting skew angle of gray and color form images. Here is a report of the results of the research. 1). In order to avoid some classified errors, this paper presents a method to distinguish tables from graphics based on the structural constrained information of table frame lines and cells. According to the structure of a table, some necessary restrictions that must be satisfied by all frame lines and cells in a table are presented in this paper. And we verify all these restrictions to distinguish tables from graphics. Experiments show that this method is effective. 2). Characters often overlap form frame lines. Such overlapping seriously deteriorates the recognition of characters. Almost all form frame line removal algorithms based on binary image, and these algorithms have some limitations. A new form frame line removal algorithms based on color images is presented in this paper. Because of using color and gray information of images, this method can avoid the effect of overlapping better. The effectiveness of this method is proved by application of financial document recognition system. 3). According to the need of financial document recognition system, this paper presents a new skew detection and correction method based on black border of financial bill gray scan images. This method decides the skew angle of a bill image according to four border fitting lines of the bill. Experiments show that this approach is fast, accurate and effective. This algorithm can be extended and applied to other gray and color scan images of rectangular white paper.
页数54
语种中文
内容类型学位论文
URI标识http://ir.iscas.ac.cn/handle/311060/6218
专题中科院软件所_中科院软件所
推荐引用方式
GB/T 7714
卜飞宇. 表格识别系统应用中若干问题的研究[D]. 中国科学院软件研究所. 中国科学院软件研究所,2004.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
LW014104.pdf(2562KB) 限制开放--请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[卜飞宇]的文章
百度学术
百度学术中相似的文章
[卜飞宇]的文章
必应学术
必应学术中相似的文章
[卜飞宇]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。