中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
大规模古藉电子化关键技术及实现
作者: 王晓波
答辩日期: 2000
专业: 计算机应用技术
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 古籍电子化 ; XML标准 ; 软件环境
摘要: 本文在总结作者亲自参与的《文渊阁四库全书》电子版的基础上,探讨了大规模古籍电子化的关键技术及实现。本文探讨了OCR相关的前后处理技术,使OCR真正实际应用于大规模古籍电子化的工程上。根据特征提取的概念,舍弃图像上的细枝末节,构造了一个自上而下的版面分析系统,并开发了OCR后处理的一系列校对工具。使《四库全书》电子化的错误率见到万分之一以下,达到了重点出版物的出版标准。为衡量OCR的准确度,文中提出了一个在工程中行之有效的独特方法来衡量OCR的可信度,这在工程及理论上都有意义。文中探讨了UNICODE在大规模古籍电子化的实现技术:显示、全文检索及跨平台等技术实现。为实现在Internet上享用古藉电子化,文中结合XML标准,构造了一个古籍电子化的软件环境。
英文摘要: Summarizing author's experiences in the engineering of digitization for Wenyuange Siku Quanshu, the thesis discusses the key technique of digitization for large scale Chinese ancient books and introduces its implementation. The thesis focuses on the processing technique before and after OCR, PreOCR and PostOCR, which enables OCR to be really utilized in the engineering of digitization of large scale books. Extracting the major features while ignoring the minor details of images, author constructs a Layout Analysis System using top-down method, and developed a several proof-reading tools to achieve the publication national standard with error rate less than 0.01%. Based on the statistics of OCR parameter DISTANCE, author establishes a unique method to evaluate the reliability of OCR output. The method has been proved useful in the engineering, and it seems meaningful in theory. The thesis introduces another major technique in the large scale digitization engineering: Unicode enabling for Chinese text display, full text retrieval as well as the single data/single binary for multilingual Windows. Finally, the thesis introduces how author made a software environment for digitizing such large scale ancient books with XML technology in order to put the e-publication onto Internet.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/5792
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW002151.pdf(1697KB)----限制开放-- 联系获取全文

Recommended Citation:
王晓波. 大规模古藉电子化关键技术及实现[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2000-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[王晓波]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[王晓波]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace