WWW元搜索引擎研究及实验系统LMSE

ISCAS OpenIR > 中科院软件所 > 中科院软件所

	WWW元搜索引擎研究及实验系统LMSE
	侯玉娜
专业	计算机应用技术
	2000
学位授予单位	中国科学院软件研究所
学位	博士
学位授予地点	中国科学院软件研究所
关键词	元搜索引擎索引数据库超级链接蜘蛛
摘要	本文对World Wide Web的产生、发展及其工作原理作了概述，引出了WWW的网络检索问题，并研究了当前主要的搜索引擎，分析了它们共有的特性和各自的优缺点。由于WWW太大又没有良好的结构及Web服务器的自治性，当前的主要搜索引擎的单个索引数据库难以涵盖所有的Web资源，并且由于各搜索引擎所采用的文档相关性评估标准不同，因此查询不可能是精确的，这样便给用户来极大的不便。如何将多个搜索引擎的结果综合起来，从中选取与用户查询相关性更强的条目，排除相关性很小的条目，并以统一的界面呈现给用户将是一项很有意义的工作。本文所做的工作正是在这样一个背景下产生的。除了当前的主流索引数据库之外，还有一种元搜索引擎。元搜索（Meta-Search）查询是一种以现有索引信息系统为基础的查询方法，其一般做法是把用户的提问同时传送至多个包含数据库的搜索引擎，然后对各搜索引擎返回的结果进行去重、排序等整理，最终响应给检索用户。本文研究了元搜索引擎的一般结构和关键技术，提出了基于链接模型的元搜索引擎。这种元搜索引擎与其他元搜索引擎的区别在于对各索引系统返回结果的处理上采用了一种基于链接的算法－HITS（Hyperlink_Induced Topic Search）算法。该算法在计算各个网页与用户的查询请求的相关度上，除了考虑网页的文本信息之外还充分利用了HTML文件中的超级链接信息，克服了传统的基于纯文本的搜索方法的局限性。本文也介绍了实验系统的实现情况，通过实验系统的测试数据证明了HITS算法在计算相关度上的有效性。同时，实验数据也充分证明了元搜索引擎与传统的搜索引擎相比在数据覆盖面上的优越性。本文从一定程度上解决了搜索引擎在查全率和查准率方面的问题。最后，对论文做了总结并提出了进一步工作的方向。
其他摘要	This article summarizes the origin、development and working principle of World Wide Web, elicits the problem of network searches, studies the current main search engines, analyzes their common characters and their special advantages and disadvantages Because WWW contains an enormous amount of information, be short of well structure and be of self-rule, it is impossible for a single search engine to cover with all the web source. As a result of the different evaluating criteria of documents relativity that each Search Engine has taken, the query will be inaccurate, which will bring up great inconvenience to the users. How to synthesize the results of multiple search engines, choose the more relative entries, get rid of the less relative items and present users with a uniform interface will be a significant work. This article is produced under such a background. There is a third kind of search engines-meta-search engines. Meta-search engines are query methods based on current index databases. They send queries simultaneously to multiple web search engines, integrate search results, merge duplicate findings into one entry, rank the results according to various criteria, present the results to the users at last. This paper studies the general structure and key technologies of meta-search engines and puts forward the Link_based Meta-Search Engine(LMSE). This search engine adopts HITS (Hyperling_Induced Topic Search) algorithm to deal with the results of the multiple web search engines. HITS not only considers the text information of the web pages but also extracts information from the link structure of network environment in computing the relativity between the user query and the web pages. Following, the article introduces the experimental system. According to the system, we can find that it is effective when we use HITS to compute the relativity. At the same time, the system indicate that meta-search engines contain more comprehensive items. This thesis solve the problem of getting more complete and high quality information to a certain extent. At last, we give the summarization and bring up several questions on further improvement and research.
页数	60
语种	中文
内容类型	学位论文
URI标识	http://ir.iscas.ac.cn/handle/311060/5888
专题	中科院软件所_中科院软件所
推荐引用方式 GB/T 7714	侯玉娜. WWW元搜索引擎研究及实验系统LMSE[D]. 中国科学院软件研究所. 中国科学院软件研究所,2000.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
LW002143.pdf（2723KB）			限制开放	--	请求全文