中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
WWW元搜索引擎研究及实验系统LMSE
作者: 侯玉娜
答辩日期: 2000
专业: 计算机应用技术
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 元搜索引擎 ; 索引数据库 ; 超级链接 ; 蜘蛛
摘要: 本文对World Wide Web的产生、发展及其工作原理作了概述,引出了WWW的网络检索问题,并研究了当前主要的搜索引擎,分析了它们共有的特性和各自的优缺点。由于WWW太大又没有良好的结构及Web服务器的自治性,当前的主要搜索引擎的单个索引数据库难以涵盖所有的Web资源,并且由于各搜索引擎所采用的文档相关性评估标准不同,因此查询不可能是精确的,这样便给用户来极大的不便。如何将多个搜索引擎的结果综合起来,从中选取与用户查询相关性更强的条目,排除相关性很小的条目,并以统一的界面呈现给用户将是一项很有意义的工作。本文所做的工作正是在这样一个背景下产生的。除了当前的主流索引数据库之外,还有一种元搜索引擎。元搜索(Meta-Search)查询是一种以现有索引信息系统为基础的查询方法,其一般做法是把用户的提问同时传送至多个包含数据库的搜索引擎,然后对各搜索引擎返回的结果进行去重、排序等整理,最终响应给检索用户。本文研究了元搜索引擎的一般结构和关键技术,提出了基于链接模型的元搜索引擎。这种元搜索引擎与其他元搜索引擎的区别在于对各索引系统返回结果的处理上采用了一种基于链接的算法-HITS(Hyperlink_Induced Topic Search)算法。该算法在计算各个网页与用户的查询请求的相关度上,除了考虑网页的文本信息之外还充分利用了HTML文件中的超级链接信息,克服了传统的基于纯文本的搜索方法的局限性。本文也介绍了实验系统的实现情况,通过实验系统的测试数据证明了HITS算法在计算相关度上的有效性。同时,实验数据也充分证明了元搜索引擎与传统的搜索引擎相比在数据覆盖面上的优越性。本文从一定程度上解决了搜索引擎在查全率和查准率方面的问题。最后,对论文做了总结并提出了进一步工作的方向。
英文摘要: This article summarizes the origin、development and working principle of World Wide Web, elicits the problem of network searches, studies the current main search engines, analyzes their common characters and their special advantages and disadvantages Because WWW contains an enormous amount of information, be short of well structure and be of self-rule, it is impossible for a single search engine to cover with all the web source. As a result of the different evaluating criteria of documents relativity that each Search Engine has taken, the query will be inaccurate, which will bring up great inconvenience to the users. How to synthesize the results of multiple search engines, choose the more relative entries, get rid of the less relative items and present users with a uniform interface will be a significant work. This article is produced under such a background. There is a third kind of search engines-meta-search engines. Meta-search engines are query methods based on current index databases. They send queries simultaneously to multiple web search engines, integrate search results, merge duplicate findings into one entry, rank the results according to various criteria, present the results to the users at last. This paper studies the general structure and key technologies of meta-search engines and puts forward the Link_based Meta-Search Engine(LMSE). This search engine adopts HITS (Hyperling_Induced Topic Search) algorithm to deal with the results of the multiple web search engines. HITS not only considers the text information of the web pages but also extracts information from the link structure of network environment in computing the relativity between the user query and the web pages. Following, the article introduces the experimental system. According to the system, we can find that it is effective when we use HITS to compute the relativity. At the same time, the system indicate that meta-search engines contain more comprehensive items. This thesis solve the problem of getting more complete and high quality information to a certain extent. At last, we give the summarization and bring up several questions on further improvement and research.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/5888
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW002143.pdf(2723KB)----限制开放-- 联系获取全文

Recommended Citation:
侯玉娜. WWW元搜索引擎研究及实验系统LMSE[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2000-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[侯玉娜]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[侯玉娜]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace