中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
XML文档的有效性验证和查询实现
作者: 戴蓓洁
答辩日期: 2007-06-03
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: XML处理器 ; 有效性验证 ; XPath ; XML查询 ; 性能测试
其他题名: Research on the Implementation of XML Validation and Query
摘要: XML(eXtensible Markup Language)是W3C定义的一种标记语言,目前已被广泛用于电子商务、B2B通信、企业信息集成和Web服务等应用中,成为网络环境下组织、存储和交换信息的基本方式之一。随着XML应用的范围越来越广,对于XML解析的性能要求也越来越高。 本文在已有的ONCE XML Parser的基础上,研究了基于DTD(Document Type Definition)的有效性验证和XML查询语言的特点,实现了支持基于DTD的有效性验证和遵循XML Path Language 1.0规范的文档查询功能的ONCE XML Processor 1.0。在设计上,ONCE XML Processor 1.0采用了轻量级系统架构和有效实用的数据结构和算法,使系统具有良好的可配置性和可扩展性。同时,ONCE XML Processor 1.0在系统结构、实现流程和语言层级等多个方面进行了性能优化,通过采用基于统计规律的策略、优化的自动机实现和合理的资源分配等措施,提高了系统的性能。 ONCE XML Processor 1.0的有效性验证完全通过了W3C提供的XML/API兼容性测试,针对两千多个XML测试文档,我们的测试程序自动地测试ONCE XML Processor 1.0中对有效性验证的处理是否符合XML规范。基于SUN提供的XML Test 1.1测试包,ONCE XML Processor 1.0中的有效性验证性能比Xerces2.9.0和Woodstox3.2.0平均高出40%左右。同时,ONCE XML Processor 1.0的文档查询实现也通过了规范的功能正确性测试,性能较Xalan-J-2.7.0均快2倍以上。这说明,ONCE XML Processor 1.0在保证功能完整的情况下,还具有高效的XML文档处理性能。
英文摘要: XML(eXtensible Markup Language) is a markup language recommended by W3C (World Wide Web Consortium), which is widely used in many situations such as E-business, B2B communication, enterprise information integration, Web services and so on. XML has become one of the fundamental methods of organizing, storing and exchanging information in the network environment. With the increasing of XML applications, the parsing performance has turned into the challenge of most XML processors. This thesis describes the design and implementation of ONCE XML Processor 1.0 based on our earlier ONCE XML Parser. After analyzing the characteristics in DTD-based validation and XML query languages, we implement validity constraints and provide the query APIs conforming to XPath (XML Path Language 1.0 specification). ONCE XML Processor 1.0 adopts light-weighted system architecture and realizes effective data structures and algorithms, which make the system configurable and extensible. We also made great efforts in optimizing ONCE XML Processor 1.0’s performance by a series of strategies such as statistics-based implementation, optimized implementation of automaton, reasonable allocation of resources, as well as some useful performance improvements on the programming language level. Validation module of ONCE XML Processor 1.0 has passed all the conformance tests provided by W3C. Our testing suites can automatically test more than 2,000 conformance testing cases, and the results show that ONCE XML Processor 1.0 totally conforms to XML specification. Meanwhile we also leverage the XML Test 1.1 from SUN to test the performance of our validation module, as well as the other popular XML processors: Xerces 2.9.0 and Woodstox 3.2.0. The results show that ONCE XML Processor 1.0’s performance on validation is about 40% higher than that of Xerces and Woodstox. On the other hand, we have also tested the XPath module of ONCE XML Processor 1.0, the result shows that it passes all the functional tests and the performance on XML query is about more than twice higher than that of Xalan. Therefore, the design and implementation of ONCE XML Processor 1.0 is effective, together with its function integrity.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/5908
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200428015029017戴蓓洁_paper.doc(1316KB)----限制开放-- 联系获取全文

Recommended Citation:
戴蓓洁. XML文档的有效性验证和查询实现[D]. 软件研究所. 中国科学院软件研究所. 2007-06-03.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[戴蓓洁]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[戴蓓洁]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace