Title: | 中文结构化信息检索系统的研究与实现 |
Author: | 张毅波
|
Issued Date: | 2001
|
Major: | 计算机软件与理论
|
Degree Grantor: | 中国科学院软件研究所
|
Place of Degree Grantor: | 中国科学院软件研究所
|
Degree Level: | 博士
|
Keyword: | 信息检索
; 结构化查询
; 关联矩阵
; 查询扩展
; 中文信息处理
|
Alternative Title: | Research on and Implementation of Chinese Structured Information Retrieval
|
Abstract: | 该文研究的主题包含中文信息检索与结构化信息检索两个方面,分别从中文信息检索系统中文档与查询条件相似性计算、查询扩展、查询条件的翻译及结构化信息检索等四点进行了研究与探讨.该文的主要贡献如下:(1)论述了基于中文语词的索引方法是中文信息检索系统中必然采取的索引方法.系统地阐述了一种新的计算检索词间关联关系的方法-基于PM的检索词对权重计算方法,并将其引入到文档与查询条件的相似性计算中.(2)研究了中文信息检索系统中检索词对的邻近关系值与互信息值对系统检索性能的影响,实验结果表明当检索词对互信息计算的精度较低时,检索词对的邻近关系值比互信息值对系统的检索性能的提高更有帮助.(3)提出了基于局部信息中检索词间关联矩阵的查询扩展方法.在由初始查询条件得到的前列文档集中,采用基于第二级关联假设自动主题词表的构建思想计算得到前列检索词及其权重值,并加入到初始查询条件中实现查询扩展.(4)提出了基于检索词间互信息的查询条件翻译方法,为查询条件中检索词的译项选择提供了新的方法,并间接地通过检索词的关联序列较好地保存了查询条件中的短语信息,构造出了检索词带有权重信息的目标语种的查询条件.(5)分析了利用XML文档中的结构信息来提高传统信息检索系统检索性能的方法.通过引入文档结构索引库、元素索引库及属性索引库实现了面向XML文档的结构化查询,设计出了中文结构化信息检索系统CSIR,并实现了其主要的一些功能. |
English Abstract: | The central themes of this dissertation are the Chinese information retrieval and the structured information retrieval. Among the themes, four aspects are researched on and probed into. They are the similarity calculation between document and query, query expansion, the translation of the query and structured information retrieval. The significant research contributions that come out of the dissertation are: Argumentation of a point that the word-based indexing method must be the one employed into the Chinese information retrieval system. A new method named PM-based weight calculation of term pairs is illustrated systematically to compute the association relationship between terms. Investigation of the effects on the retrieval performance by the proximity and mutual information of the term pairs in the Chinese information retrieval system. A conclusion is made by the results of experiments that the proximity of term pairs is more helpful for the improvement of the retrieval performance than their mutual information does when the latter cannot be calculated precisely. Presentation of a query expansion method based on the association matrix among the
local information. The query expansion process can be described as follows. Firstly, the association value between terms can be calculated by borrowing the main idea of the automatic thesaurus construction based on the second order association hypothesis in the top-list documents retrieved by the original query. Secondly, the top-list terms can be gotten by the rank of their association value. Finally, the query expansion can be achieved by adding the top-list terms and their weight into the original query. Presentation of a query translation method based on the mutual information between terms. The method provides a new path to select the translations of the term, and indirectly preserve the phrase information in the query by the term's association list. A destination language query is finally constructed within it the term has its own weight. Illustration of the methods to improve the retrieval performance of the traditional information retrieval system by the use of the structure information in the XML documents. By introducing the document structure index database, element index database and attribute index database, the structured retrieval aimed at the XML documents is achieved and the Chinese structured information retrieval system CSIR is designed within it some important parts are implemented. |
Language: | 中文
|
Content Type: | 学位论文
|
URI: | http://ir.iscas.ac.cn/handle/311060/6176
|
Appears in Collections: | 中科院软件所
|
File Name/ File Size |
Content Type |
Version |
Access |
License |
|
LW008621.pdf(2409KB) | -- | -- | 限制开放 | -- | 联系获取全文 |
|
Recommended Citation: |
张毅波. 中文结构化信息检索系统的研究与实现[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2001-01-01.
|
|
|