中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
中文结构化信息检索系统的研究与实现
作者: 张毅波
答辩日期: 2001
专业: 计算机软件与理论
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 信息检索 ; 结构化查询 ; 关联矩阵 ; 查询扩展 ; 中文信息处理
其他题名: Research on and Implementation of Chinese Structured Information Retrieval
摘要: 该文研究的主题包含中文信息检索与结构化信息检索两个方面,分别从中文信息检索系统中文档与查询条件相似性计算、查询扩展、查询条件的翻译及结构化信息检索等四点进行了研究与探讨.该文的主要贡献如下:(1)论述了基于中文语词的索引方法是中文信息检索系统中必然采取的索引方法.系统地阐述了一种新的计算检索词间关联关系的方法-基于PM的检索词对权重计算方法,并将其引入到文档与查询条件的相似性计算中.(2)研究了中文信息检索系统中检索词对的邻近关系值与互信息值对系统检索性能的影响,实验结果表明当检索词对互信息计算的精度较低时,检索词对的邻近关系值比互信息值对系统的检索性能的提高更有帮助.(3)提出了基于局部信息中检索词间关联矩阵的查询扩展方法.在由初始查询条件得到的前列文档集中,采用基于第二级关联假设自动主题词表的构建思想计算得到前列检索词及其权重值,并加入到初始查询条件中实现查询扩展.(4)提出了基于检索词间互信息的查询条件翻译方法,为查询条件中检索词的译项选择提供了新的方法,并间接地通过检索词的关联序列较好地保存了查询条件中的短语信息,构造出了检索词带有权重信息的目标语种的查询条件.(5)分析了利用XML文档中的结构信息来提高传统信息检索系统检索性能的方法.通过引入文档结构索引库、元素索引库及属性索引库实现了面向XML文档的结构化查询,设计出了中文结构化信息检索系统CSIR,并实现了其主要的一些功能.
英文摘要: The central themes of this dissertation are the Chinese information retrieval and the structured information retrieval. Among the themes, four aspects are researched on and probed into. They are the similarity calculation between document and query, query expansion, the translation of the query and structured information retrieval. The significant research contributions that come out of the dissertation are: Argumentation of a point that the word-based indexing method must be the one employed into the Chinese information retrieval system. A new method named PM-based weight calculation of term pairs is illustrated systematically to compute the association relationship between terms. Investigation of the effects on the retrieval performance by the proximity and mutual information of the term pairs in the Chinese information retrieval system. A conclusion is made by the results of experiments that the proximity of term pairs is more helpful for the improvement of the retrieval performance than their mutual information does when the latter cannot be calculated precisely. Presentation of a query expansion method based on the association matrix among the local information. The query expansion process can be described as follows. Firstly, the association value between terms can be calculated by borrowing the main idea of the automatic thesaurus construction based on the second order association hypothesis in the top-list documents retrieved by the original query. Secondly, the top-list terms can be gotten by the rank of their association value. Finally, the query expansion can be achieved by adding the top-list terms and their weight into the original query. Presentation of a query translation method based on the mutual information between terms. The method provides a new path to select the translations of the term, and indirectly preserve the phrase information in the query by the term's association list. A destination language query is finally constructed within it the term has its own weight. Illustration of the methods to improve the retrieval performance of the traditional information retrieval system by the use of the structure information in the XML documents. By introducing the document structure index database, element index database and attribute index database, the structured retrieval aimed at the XML documents is achieved and the Chinese structured information retrieval system CSIR is designed within it some important parts are implemented.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/6176
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW008621.pdf(2409KB)----限制开放-- 联系获取全文

Recommended Citation:
张毅波. 中文结构化信息检索系统的研究与实现[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2001-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[张毅波]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[张毅波]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace