中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
Title:
基于语义的异构信息内容集成技术研究
Author: 李剑
Issued Date: 2007-01-19
Degree Grantor: 中国科学院软件研究所
Place of Degree Grantor: 软件研究所
Degree Level: 博士
Keyword: 信息内容集成 ; 语义Web ; 本体 ; 查询划分与转换 ; 完整性约束 ; 模式集成 ; 信息门户
Alternative Title: Semantic Integration on Heterogeneous Information Sources
Abstract: 随着网络技术、数据库技术以及其他信息内容存储访问技术的发展,用户需要访问和可以访问的信息范围不断扩大,信息内容也不断增加,同时这些信息内容可能存在于通过网络连接的分布的计算机系统中,它们的存储地点、表示形式以及访问方式各不相同。因此,用户对于集成各种分布和异构信息来源中信息内容的要求与日俱增。 分布信息内容集成是指将分布异构的信息内容集成起来,其以统一的表示形式展现给访问用户,用户能以统一的访问接口和信息获取方式来访问这些不同来源、不同表示形式的信息内容。基于语义的分布信息内容集成是指采用统一语义模型来表示分布信息来源中信息内容的集成方式。 对分布信息内容集成的研究很多,特别是基于语义的分布信息内容集成领域,对这一领域的研究取得了一定的成果,但是这些研究中存在着下述不足:缺乏一种具有普遍适用性的语义集成多种类型信息内容的方法,其中信息来源包括结构化、半结构化以及非结构化的信息内容类型;缺少根据集成中的映射对应关系,将对所集成信息内容的全局查询划分转换为局域查询的方法;在保证集成信息内容满足完整性约束条件的方法中,缺乏针对所集成信息内容不完全性的解决方法的研究;缺少基于语义的信息内容集成与基于XML模式的信息内容集成相结合及相关技术的研究;缺乏对非结构化的网页或文本文档进行语义集成以及相关语义搜索技术的研究;对于包含信息内容描述而彼此之间又存在着关联关系的分布RDF(S)描述,缺乏对其基于语义集成方法的研究。本文从这些问题出发,研究相应的解决方法和关键技术,并完成了语义信息内容集成的部分实现。 论文研究的基于语义的异构信息内容集成涉及到集成关系数据、XML文档、RDF(S)描述以及网页和文本文档多种类型的信息来源。论文给出了基于语义的异构信息内容集成的总体框架,其采用三层结构形式,底层是各种分布信息来源,集成层以表示全局领域模型的本体来集成这些信息内容,并将信息门户作为信息内容发布层中用户访问和获取信息内容集成结果的接口。同时,所提出的本体到局域信息模式的映射模型能统一的表示基于语义的信息映射和信息集成方式。 基于本体的结构化(关系数据)和半结构化(XML)信息内容集成中的关键问题是如何实现对全局本体的概念实例查询到局域信息数据查询的查询划分与转换。论文提出了一种本体概念实例查询的操作表示,并基于这一查询操作表示给出了将全局查询划分为局域查询的方法,同时还给出了将局域查询转换为对应局域信息源的本地查询的方法,各个局域查询结果经过集成和转换后以统一的形式返回给用户。使用本方法来查询所集成的信息内容来源,可以获取用户所需要的正确查询结果。 语义信息内容集成中需要集成的分布信息数据之间可能不满足全局定义的完整性约束条件。通过对用户查询进行重写,可以采用重写后的查询来获得满足完整性约束的查询结果。一致性查询重写通过附加给查询限定性条件,使查询去除某些导致不满足完整性约束的查询结果;论文提出的完全性查询重写能通过附加补充性条件导致获取某些附加查询结果,从而解决集成中信息内容不完全性引起的不满足完整性约束冲突。复合查询重写则是根据对完整性约束集合的划分综合运用一致性查询重写和完全性查询重写的方法。使用特定的完整性约束集合划分能使复合查询重写后的查询获取的数据结果满足用户的偏好性需求。 在基于语义的分布XML集成中,局域XML信息来源可以为集成多个局域XML文档所形成的虚拟XML信息来源。由于局域XML信息来源是动态变化的,所集成的XML模式需要随着局域XML模式的加入或者删除而变化。论文所提出的模型用来描述集成XML模式和局域XML模式之间良构性的匹配映射关系,并提出了一种实现局域XML模式加入和删除所导致的集成XML模式修改的方法,此方法能在XML模式集成过程中保持匹配映射的可包容良构性,从而能保证局域XML到集成XML数据的正确转换。 为了集成分布的RDF(S)描述,论文提出了描述分布异构RDF(S)的分布RDF(S)模型,这一模型能描述分布RDF(S)描述之间的联系。同时基于这一模型给出了实现分布RDF(S)查询的方法,此查询方法既能实现实例层次的查询,也能实现概念层次的查询。同时,在分布RDF(S)模型和分布RDF(S)查询方法的基础上,可以实现基于全局本体的分布RDF(S)描述集成。 在基于语义的网页与文本文档集成中,为了获取更准确的网页和文本信息搜索结果,论文提出了基于语义索引的语义搜索技术。语义索引是对本体中概念与关系和所集成文档的相关程度进行索引,根据此索引,可以查找和语义概念相关的文档。用户的语义搜索请求可以划分为对语义索引的搜索部分和对关键词索引的搜索部分,这两部分的搜索结果通过一定规则进行处理,所获得的语义搜索结果可以在保证一定查准率的基础上获得满意的查全率。 OncePortal是我们实现的一种集成信息与应用的信息门户, 它可以作为基于语义的异构信息内容集成中的用户访问和信息发布方式。这样用户能以统一的OncePortal访问方式来访问被语义集成的信息内容,在其中输入对所集成信息内容的语义查询或搜索请求,语义查询或搜索的结果也以OncePortal输出网页的形式返回给用户。
English Abstract: With the developments of net technologies, database technologies and other information storage and access technologies, the scope of information content users access is expanding, and the volume of them is increasing. These information content may be stored in distributed computer systems connected with net. Their storage spots, expression forms and accessing ways are different between each other. So, the demands to integrate information content in distributed and heterogeneous information sources are growing. Distributed information content integration means that integrating distributed and heterogeneous information content from different sources and presenting them to the users with a uniform expression. Then, users can access information content in different sources with a same way. Using a global semantic model expressing the information content in different sources to integrate them is called semantic integration of information content. There are many researches in the area of distributed information content integration, especially in that of semantic integration of information content, and they have got some achievements. There are still some insufficiencies in these researches: lack of a general method to integrate various kinds of information content, which may include structured, semi-structured, or non-structured information; lack of a method to divide global query into local queries according to mappings between the semantic model and distributed information content; lack of a method to handle the incompleteness of semantically integrated information content; lack of researches on combining semantic integration of information content with XML schema based information integration and relevant technologies; lack of researches on semantic integration of web pages and text documents and the technologies of semantic search; lack of the semantic integration method to integrate distributed RDF(S) descriptions that have relations with each other. This paper researched semantic integration of heterogeneous information content can integrate varied kinds of information, which includes relational data, XML documents, RDF(S) descriptions, and Web pages and text documents.The semantic integration system has three layers: the bottom layer consists of distributed and heterogeneous information sources; the middle layer uses a global ontology to integrate heterogeneous information content; the upper layer uses information portal as the user interface to access integrated information content. This paper presents a mapping model between the global ontology and heterogeneous information content, which expresses a uniform integration way for different kinds of information content. The difficult problem of ontology-based relational data and XML documents integration is how to reformulate a global query into multiple subqueries over those local sources. This paper presents a uniform expression for queries of concept instances and a method for dividing the query expression into the local ones. All local query results could be integrated for the users’ purposes. By these methods, users could get right results they desire from distributed information sources. In ontology-based heterogeneous information content integration, the data from distributed information source may not satisfy global integrity constraints. We can query the distributed information with the rewritten query of original one to get data that satisfy the integrity constraints. Consistent rewritten query can get correct results by abandoning the data cause conflicts. Complete rewritten query can get additional data to make the results satisfy integrity constraints. Composite query rewriting combines consistent query rewriting and complete query rewriting according to the division of global integrity constraint set. It can get user preferred query results with certain constraint set division. Several schema-based integrated local XML sources can be seemed as a virtual information source participates in semantic integration of information content. The virtual XML schema will be changed when local XML sources dynamically vary. This paper presents a model to describe the mapping relations between the integrated XML schema and local XML schemas that adapt to the well-formed data translation. It also presents a method to change the integrated XML schema according to adding and deleting local sources, which can keep the inclusion well-formedness in the mappings and ensure proper data translation from the local XML to the virtual XML source. In order to semantically integrate distributed RDF(S) descriptions, this paper presents a distributed RDF(S) model to describe them and the relations between them. Based on this model, a method to query distributed RDF(S) descriptions is also presented, which can retrieve data of concepts as well as that of instances. Base on these methods, the distributed RDF(S) descriptions can be semantically integrated. In semantic web page and text documents integration, in order to make the search more accurate, this paper presents a text search method based on semantic index. The semantic index is to index concepts and roles in the ontology with relevant documents. Users’ search is divided into semantic index search and keyword index search and performed in respective indexes, and the search results are integrated to get users required semantic search results. By this method, the semantic search will get a better recall/precision rate than the keyword-based search. OncePortal is a kind of information portal to integrate information and applications. It is used as the user accessing and information publishing way in semantic integration of information content. Users can access semantically integrated information content by OncePortal as the uniform way. They input the semantic queries or search requirements in OncePortal pages and get the query or search results from it.
Language: 中文
Content Type: 学位论文
URI: http://ir.iscas.ac.cn/handle/311060/7132
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200118015004951李剑_paper.doc(7449KB)----限制开放-- 联系获取全文

Recommended Citation:
李剑. 基于语义的异构信息内容集成技术研究[D]. 软件研究所. 中国科学院软件研究所. 2007-01-19.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[李剑]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[李剑]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2021  中国科学院软件研究所 - Feedback
Powered by CSpace