中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件工程技术研究开发中心  > 学位论文
题名:
OnceDI中可视化ETL工具的设计与实现
作者: 赵迪
答辩日期: 2008-06-05
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 数据集成 ; ETL ; 数据转换 ; 中间件
其他题名: Design and Implementation of Visualized ETL System in OnceDI
摘要: 随着网络技术的快速发展、企业信息化的不断深入,企业中分布的数据、信息和知识更加多样,更加复杂,企业信息系统更加开放。如何实现企业中这些数据、信息和知识集成和共享已成为关键性问题。数据集成技术正是针对这种需求,实现分布、异构、复杂数据、信息和知识的动态、灵活、实时的集成和共享。 OnceDI 2.0很好的解决了在数据级别上异构数据源的互操作问题,满足不同的数据集成需求,跨平台,跨多种数据源,具有增量传输,冲突解决等多种实用机制,并提供完善的安全和管理工具。然而,它也存在缺陷,包括:接收数据源只能根据接收到的数据块定义,这时已经完成了数据的发送过程;发送数据源和接收数据源的字段对应关系必须完全由人工构建等问题。 数据集成的目标是为用户访问多个分布的、独立的、异构的数据源提供统一的应用界面。在ETL(Extract-Transform-Load,即数据抽取、转换和加载)过程可视化配置中,包含如何让用户更好地理解ETL过程以及如何让用户更有效地、更容易地配置、管理和执行ETL过程等问题。 论文在研究数据集成过程特点基础上,围绕数据集成中的可视化ETL过程的问题,确立了本文关于数据集成中数据转换和数据过滤的研究方向。针对数据转换,论文从模式匹配和实例转换两方面入手。在模式匹配方面,论文提出一种本体辅助的自动化模式匹配算法,它包括三部分:决策树学习和WordNet词汇本体相结合的方法计算属性名称匹配,定义属性数据类型本体解决带数据类型的属性匹配以及利用领域本体构建属性间的非直接映射关系解决一对多的语义匹配。该方法使得数据转换的可视化过程操作更加简便,自动化匹配结果更令用户满意。在实例转换方面,论文提出一种实例转换工具的设计方案,界面更加友好,更重要地,使得用户对实例级别的转换操作更加清晰、简单。针对数据过滤,论文从数据质量控制条件设置的特点入手,提出一种数据质量控制条件设置工具设计方案。 最后,本文针对OnceDI 3.0中的数据集成模型和OnceDI 3.0客户端-控制中心-DI服务器的三层体系结构设计实现数据集成中的可视化ETL工具,在设计中通过设计模式的应用增强了系统的可扩展性。
英文摘要: With the rapid development of the network technologies and the enterprise information technologies, the data, information and knowledge distributed in enterprises are being more diversity and complication. Meanwhile, the enterprise information systems become more open. There is a problem that the distributed, heterogeneous and sophisticated data, information and knowledge within enterprises need to be integrated and shared. Data integration technology is just the one that can realize the dynamic, flexible and real-time integration and share such kind of data, information and knowledge. OnceDI 2.0 has solved many problems well, such as mutual operation of heterogeneous data sources based on data, meeting many kinds of requirements, supporting different platforms, supporting different data sources, incremental delivery, dealing with many kinds of conflicts, and offering perfect tools for security and management. However, it does exist some flaws. The system limits the definition of receiving data source to the definition of the receiving data block, which data sending process has obviously done. The column matching between the sending data source and the receiving data source can be done only by totally manual way. So on so forth. The goal of data integration is that it can provide unified application interface for the user accessing distributed, heterogeneous and sophisticated data sources. In the visualized ETL process, there are problems such as how to make users understand ETL process well, how to make users configure, manage and execute the ETL process more efficiently and easily. After researching the characteristics of data integration, focusing on the issues during the ETL process of data integration, the thesis establishes the research direction towards the data transformation and data filtering in data integration. In each research direction above, the thesis proposes a combined strategy of schema match and instance transformation. On the schema match hand, the thesis proposes a schema match algorithm, which makes the data transformation in the ETL process more easily and friendly. On the instance transformation hand, the thesis gives out a solution, which makes the instance-based transformation more clearly and easily. Furthermore, the thesis proposes two kinds of tools to resolve the data filtering problem. Concerning the implementation, based on the data integration models above and the 3-tier architecture of OnceDI 3.0, the thesis gives out the design and the implementation of the visualized ETL tool in OnceDI 3.0. A series of design patterns are applied in the design to enhance the extensibility of the system.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/6664
Appears in Collections:软件工程技术研究开发中心 _学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200528015029034赵迪_paper.pdf(2612KB)----限制开放-- 联系获取全文

Recommended Citation:
赵迪. OnceDI中可视化ETL工具的设计与实现[D]. 软件研究所. 中国科学院软件研究所. 2008-06-05.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[赵迪]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[赵迪]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace