中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
WebOffice中文档格式化的研究
作者: 贺理
答辩日期: 2007-06-01
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 文档格式化 ; 文本断行 ; 文档分页 ; ICU ; Swing
其他题名: Research on the Document Formatting of WebOffice System
摘要: 基于在线办公套件WebOffice的开发实践,论文对文档格式化的概念、模型、流程和系统框架结构进行研究,重点实现了文本断行与文档分页两个关键部分,取得了五个方面的主要成果: 第一,论文从文字处理系统的演化历程概括审视了文档格式化的发展现状,分析了客户端计算技术。针对WebOffice系统架构,提出WebOffice系统文档格式化需要解决的主要问题是:通过浏览器端计算,实现Web文档格式化。 第二,论文从逻辑结构模型和物理结构模型两个视角,分析了文档对象模型、盒子/胶水模型、文档布局三方面的内容。详细的阐述计算机系统对Web文档的组织形式理解,为文档格式化的研究奠定了基础。 第三,文档格式化是文档内容的格式化展现,涉及文档解析、字体解析、显示布局等方面,是较复杂的计算过程。论文在阐述WebOffice系统文字处理总体框架基础上,提出了文档格式化问题的关键是逻辑结构到物理结构的映射。进而将WebOffice文档的格式化流程归结为HTML解析和DOM元素计算、文档断行和文档分页、格式化输出和浏览器显示三个阶段。 第四,文本断行是文档格式化最基本的要求,也是文档格式化的精髓。为此,论文将文本断行问题抽象为可断行点定位和断行策略两个方面。ICU是实现了Unicode标准中的断行属性描述和文本边界界定的国际化开发开源的函数库。同时论文重点分析了断行策略中的逐行算法。在此基础上给出了文本断行的BreakIterator方案。此外,论文还给出了另外一种基于Swing组件的文本断行方案。两种方案各有所长,为WebOffice开发的不同时期所采用。 第五,文档分页是办公套件和其他字处理软件的主要区别,是WebOffice文档格式化研究的重点。论文将文档分页问题抽象为垂直方向上的文本断行,文本断行策略同样适合分页问题。论文在分析Java Swing组件的MVC设计模式基础上,从文档视图入手,提出了文档分页的解决方案。论文按照自顶向下顺序叙述了文档分页方案的实现细节。
英文摘要: Based on practice of WebOffice system development, research on the conception, model, process and system framework of document formatting is conducted in this dissertation. The key aspects including line breaking and page breaking are highlighted. As a result, following five principal achievements have been obtained. First, the development of document formatting is summarized by introducing word processing system evolution and client-side computing technology is analyzed. For WebOffice system, one of the main problems needed to solve is implementing document formatting on browser-side. Second, Document Object Model, box/glue Model and Document Layout are analyzed from two perspectives including logical structure and physical structure. The organization of Web document is described in detail, which is the foundation of document formatting study. Third, document formatting that transforms document content to document display involves document analysis, font analysis and document layout, which is a complex computing process. Based on the description of the general framework of WebOffice word processing part, the key of document formatting is found, which is mapping from logical structure to physical structure. Further more, the document formatting task of WebOffice is divided into three steps: HTML document analysis and DOM elements measurement, line breaking and page breaking, formatting output to browser window. Four, line breaking is the basic requirement and also the essence of document formatting. Therefore, document formatting issue is interpreted to line breaking location and line breaking strategy. ICU is a mature, widely used set of C/C++ and Java which implements line breaking properties and text boundaries support in Unicode. And line-by-line algorithm of line breaking strategy is deeply analyzed. Based on these, BreakIterator solution for line breaking is put forward. In addition, another solution based on Swing components is put forward as well. Both solutions have their own strengths and are used in different versions of WebOffice. Five, document pagination, the main difference between Office-Set and other word-processing software, is also required by WebOffice document formatting. Document pagination is interpreted to paragraph breaking at vertical direction comparing to line breaking at horizontal direction, so line breaking strategy is also suitable for pagination issue. Based on analysis of MVC design patterns in Java Swing components, a document pagination solution focusing on document View is put forward. In a top-down order, implement details of this solution is described in this dissertation.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/7416
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200428015029038贺理_paper.pdf(1257KB)----限制开放-- 联系获取全文

Recommended Citation:
贺理. WebOffice中文档格式化的研究[D]. 软件研究所. 中国科学院软件研究所. 2007-06-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[贺理]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[贺理]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace