中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
面向主题的WWW信息挖掘及实验系统TWIMS
作者: 余晨
答辩日期: 2002
专业: 计算机应用技术
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 搜索引擎 ; 主题 ; 爬行器 ; 权威页面 ; HUB页面
其他题名: Topic-Driven Web Information Mining and The Design and Implementation of TWIMS
摘要: 该文首先对WorldWideWeb的产生、发展及其工作原理作了概述,引出了网络检索问题,分析了当前搜索引擎的工作原理.近年来聚焦于单一主题的WWW检索逐渐受到重视,提出了聚焦爬行(FocusedCrawling)的概念.该文在此基础上结合数据挖掘技术提出一种面向主题的WWW信息挖掘框架,不仅能够在有限的软硬件和网络资源条件下,实时高效地完成主题相关的网页的收集,更重要的是能够对检索到主题相关网页进行链接结构分析和相关主题分析,最大限度的对主题进行挖掘,这是普通搜索引擎也不具备的.全文共分为以下五章:第一章介绍了WWW和WWW上搜索技术发展的现状.第二章分析了普通搜索引擎技术.第三章阐述了面向主题的WWW信息挖掘框架.第四章描述了系统原型TWIMS的设计与实现,对在各个模块开发中所涉及到的关键技术进行了讨论,包括数据结构、核心算法和流程分析、多线程控制的技术实现等等.第五章总结全文并提出了进一步的工作展望.
英文摘要: This article summarizes the origin, development and working principle of World Wide Web, elicits the problem of network searchers, studies the current main search engines. With the explosive growth of the World-Wide Web, it is becoming increasingly difficult for users to collect and analyze Web pages that are relevant to a particular topic, hi this paper, Topic-Driven Web Information Gathering system is presented, which can efficiently collects Web pages for a topic in relatively limited hardware and network resources, and keeps the pages more up-to-date. The whole paper is organized as follows: The First Chapter analyzes the current situation of WWW and searching technology on WWW. Chapter 2 mainly discusses the general search engine technology. The Third Chapter describes a Framework of Topic-Driven Web Information Mining based on web mining and Focus Crawling technology. Chapter 4 design and implementation a test system TWIMS, describes the core technologies and solutions in development, including the design of data structure, algorithms, multithread, etc. The paper is closed with a conclusion, and a view of further works to be done in Chapter 5.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/7638
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW008635.pdf(2553KB)----限制开放-- 联系获取全文

Recommended Citation:
余晨. 面向主题的WWW信息挖掘及实验系统TWIMS[D]. 中国科学院软件研究所. 中国科学院软件研究所. 2002-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[余晨]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[余晨]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace