ISCAS OpenIR  > 中科院软件所  > 中科院软件所
WWW元搜索引擎研究及实验系统LMSE
侯玉娜
Major计算机应用技术
2000
Degree Grantor中国科学院软件研究所
Degree Level博士
Place of Degree Grantor中国科学院软件研究所
Keyword元搜索引擎 索引数据库 超级链接 蜘蛛
English Abstract本文对World Wide Web的产生、发展及其工作原理作了概述,引出了WWW的网络检索问题,并研究了当前主要的搜索引擎,分析了它们共有的特性和各自的优缺点。由于WWW太大又没有良好的结构及Web服务器的自治性,当前的主要搜索引擎的单个索引数据库难以涵盖所有的Web资源,并且由于各搜索引擎所采用的文档相关性评估标准不同,因此查询不可能是精确的,这样便给用户来极大的不便。如何将多个搜索引擎的结果综合起来,从中选取与用户查询相关性更强的条目,排除相关性很小的条目,并以统一的界面呈现给用户将是一项很有意义的工作。本文所做的工作正是在这样一个背景下产生的。除了当前的主流索引数据库之外,还有一种元搜索引擎。元搜索(Meta-Search)查询是一种以现有索引信息系统为基础的查询方法,其一般做法是把用户的提问同时传送至多个包含数据库的搜索引擎,然后对各搜索引擎返回的结果进行去重、排序等整理,最终响应给检索用户。本文研究了元搜索引擎的一般结构和关键技术,提出了基于链接模型的元搜索引擎。这种元搜索引擎与其他元搜索引擎的区别在于对各索引系统返回结果的处理上采用了一种基于链接的算法-HITS(Hyperlink_Induced Topic Search)算法。该算法在计算各个网页与用户的查询请求的相关度上,除了考虑网页的文本信息之外还充分利用了HTML文件中的超级链接信息,克服了传统的基于纯文本的搜索方法的局限性。本文也介绍了实验系统的实现情况,通过实验系统的测试数据证明了HITS算法在计算相关度上的有效性。同时,实验数据也充分证明了元搜索引擎与传统的搜索引擎相比在数据覆盖面上的优越性。本文从一定程度上解决了搜索引擎在查全率和查准率方面的问题。最后,对论文做了总结并提出了进一步工作的方向。
AbstractThis article summarizes the origin、development and working principle of World Wide Web, elicits the problem of network searches, studies the current main search engines, analyzes their common characters and their special advantages and disadvantages Because WWW contains an enormous amount of information, be short of well structure and be of self-rule, it is impossible for a single search engine to cover with all the web source. As a result of the different evaluating criteria of documents relativity that each Search Engine has taken, the query will be inaccurate, which will bring up great inconvenience to the users. How to synthesize the results of multiple search engines, choose the more relative entries, get rid of the less relative items and present users with a uniform interface will be a significant work. This article is produced under such a background. There is a third kind of search engines-meta-search engines. Meta-search engines are query methods based on current index databases. They send queries simultaneously to multiple web search engines, integrate search results, merge duplicate findings into one entry, rank the results according to various criteria, present the results to the users at last. This paper studies the general structure and key technologies of meta-search engines and puts forward the Link_based Meta-Search Engine(LMSE). This search engine adopts HITS (Hyperling_Induced Topic Search) algorithm to deal with the results of the multiple web search engines. HITS not only considers the text information of the web pages but also extracts information from the link structure of network environment in computing the relativity between the user query and the web pages. Following, the article introduces the experimental system. According to the system, we can find that it is effective when we use HITS to compute the relativity. At the same time, the system indicate that meta-search engines contain more comprehensive items. This thesis solve the problem of getting more complete and high quality information to a certain extent. At last, we give the summarization and bring up several questions on further improvement and research.
Pages60
Language中文
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/5888
Collection中科院软件所_中科院软件所
Recommended Citation
GB/T 7714
侯玉娜. WWW元搜索引擎研究及实验系统LMSE[D]. 中国科学院软件研究所. 中国科学院软件研究所,2000.
Files in This Item:
File Name/Size DocType Version Access License
LW002143.pdf(2723KB) 限制开放--Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[侯玉娜]'s Articles
Baidu academic
Similar articles in Baidu academic
[侯玉娜]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[侯玉娜]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.