Title: | 面向数据集成的数据复制和查询优化 |
Author: | 陈义
|
Issued Date: | 2004
|
Major: | 计算机应用技术
|
Degree Grantor: | 中国科学院软件研究所
|
Place of Degree Grantor: | 中国科学院软件研究所
|
Degree Level: | 博士
|
Keyword: | 数据集成
; 频繁结构挖掘
|
Abstract: | 数据集成是数据库领域的研究热点之一。Internet上的信息集成是一个复杂而艰巨的任务。它涉及人工智能、高级数据库与知识库系统、分布式信息系统、信息检索和人机交互等各个领域的问题。随着XML逐渐成为Internet上数据表示与交换的标准,数据集成的研究也发生了重点转移,基于XML的数据集成系统逐渐成为研究的热点。论文以基于XML的数据集成为背景,对XML查询处理和优化、选择性地复制XML数据、以及复制数据的更新等问题做了研究。论文在以下方面作了有益的探索。论文对基于XML的数据集成系统中查询处理方法做了深入研究,给出一种利用模式信息优化查询的方法。针对Internet环境下数据集成的特点,用户查询日志分析是论文的主要内容之一。通过分析用户查询日志,将用户”访问最频繁的部分数据进行复制,可以消除不必要的网络数据传输,提高系统的性能。论文采用频繁结构挖掘的方法得到用户查询的分布情况,自动确定哪一部分的数据最有复制的必要。论文对有序的XML查询、无序的XML查询日志库的挖掘问题做了深入研究,并给出两个高效的算法解决它们的挖掘问题。试验结果表明我们的方法可以显著地提高挖掘效率。论文提出一种查询模式的增量发掘算法,以便用尽可能小的代价处理数据集的更新。论文提出一种动态确定挖掘间隔周期的方法,提高增量挖掘算法的易用性。对于同数据复制相关的另一个问题,即如何高效地维护实体化视图的问题,论文也做了深入探讨。论文给出了XML视图可自维护性的判定方法,并提出一种自维护的算法。 |
English Abstract: | Date integration is an important direction in the filed of database research. With the growing importance of XML as a data exchange and storage format, researchers gradually shift their focus to XML-based data integration, and intensive interests have been triggered on the issue how to manage and retrieve XML information efficiently in XML-based data integration systems. Under such circumstances, the dissertation explores the issues of data replication and query processing in XML-based data integration system. The key contributions are the followings. The dissertation investigates the issue how to optimize XML queries using the schema information, and presents an algorithm on path expansion to lower the processing overhead of non-deterministic regular path expression in ordered XML queries, An effective approach to improve the performance of data integration systems is to discover frequent XML -query patterns and replicate frequently accessed attributes or documents. The dissertation presents several efficient algorithms to discover frequently occurred ordered XML query patterns or unordered XML query patterns respectively. Experiments show that our algorithms result in significant performance gains. The dissertation proposes an algorithm to discover frequent XML query patterns ncrementally, and presents a method to choose dynamically the intervals at which the ncremental mining algorithm should be re-run the dissertation gives a necessary and sufficient condition of self-maintainability of aterialized XML views, and presents an algorithm to implement self-maintenance of XML iews to preserve the consistency of replicated data with data sources when base data changes. |
Language: | 中文
|
Content Type: | 学位论文
|
URI: | http://ir.iscas.ac.cn/handle/311060/7656
|
Appears in Collections: | 中科院软件所
|
File Name/ File Size |
Content Type |
Version |
Access |
License |
|
LW013942.pdf(3108KB) | -- | -- | 限制开放 | -- | 联系获取全文 |
|
Recommended Citation: |
陈义. 面向数据集成的数据复制和查询优化[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2004-01-01.
|
|
|