中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
Blog挖掘和推荐系统的设计和实现
作者: 康楠
答辩日期: 2007-06-04
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: Blog搜索 ; Blog挖掘 ; 文本分类 ; 特征选择
其他题名: Design and Implementation of a Mining and Recommendation System for Blog
摘要: 随着Web2.0技术的成熟,Blog作为Web2.0的重要应用,以其个性化的信息发布平台、多元化的内容载体等特点,吸引着越来越多的网络用户参与其中。撰写和浏览Blog已经成为网络文化新的流行热点,并直接推动了Blog搜索服务的发展。目前的Blog搜索服务大都是基于对查询关键字的匹配来实现的,缺乏自动提取用户兴趣并进行推荐的能力。本文设计和实现了一个面向Blog的挖掘和推荐系统Blog-digger,该系统采用Blog挖掘技术,能自动识别用户的兴趣,并主动推荐主题相关的Blog。 本文首先对Web2.0技术进行了概要性的介绍,然后,对文本分类技术进行了细致的探讨,并对该技术中所涉及的各种方法的特点及其性能进行了分析,选出更适合针对Blog进行挖掘的方法。本文还对Blog排名问题进行了较为深入的研究:介绍了基于链接的网页排序算法以及它们在Blog排名上的不适用性,又对现有的两种Blog排名算法进行了分析,指出了其存在的局限性。本文提出了一种基于Blog内容特征的新排序方法。该方法利用机器学习领域中的RankBoost算法,得到一个可以量化Blog热门程度的表达式。在nDCG标准测试中,新算法的排序结果比现有的Blog排名算法提高了14.5%。本文详细描述了Blog-digger系统的设计和实现,包括体系结构、服务器组件和客户组件,介绍了系统的工作流程以及实际系统的工作情况。
英文摘要: With the maturity of Web2.0 technologies, Blog attracts a growing number of users involved by providing a personalized information publish platform and presenting a wide range of contents. Writing and browsing Blog has become the Internet's new fashion, which promotes the development of Blog search service. However the most up-to-date Blog search services only provide keyword-based matching, which doesn’t provide automatic extraction of user interests and further interest-related Blog recommendation. This thesis describes the design and implementation of an interest mining and recommendation system Blog-digger. The system adopts interest mining technology and can identify the user's interests and recommend interest-related Blog initiatively. This thesis gives a brief introduction of Web2.0 technologies. A detailed elaboration on text classification techniques is given. Moreover, the characteristics and performance of several various methods involved in text classification are analyzed, which shed light on the selection of proper method for interest mining. The thesis also conducted a state-of-the-art study on Blog Ranking problem: it first introduces the classic Page Ranking algorithms for webpage and their poor appropriateness on Blog Ranking, then analysis two existing ranking algorithms for Blog, thereby points out their innate limitations. Finally, a new ranking method based on the statistical data of Blog contents is proposed. Using RankBoost algorithm in machine learning field, the thesis gives a quantitive characterization of the popularity of a Blog. Using nDCG standard benchmark, the new ranking algorithm has improved sorting result by 14.5% compared to the existing Blog ranking algorithm. And then, the thesis describes the design and implementation of Blog-digger in detail, including its architecture, server components and client components, and introduces the procedure flow and running states.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/7588
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200428015029023康楠_paper.doc(5064KB)----限制开放-- 联系获取全文

Recommended Citation:
康楠. Blog挖掘和推荐系统的设计和实现[D]. 软件研究所. 中国科学院软件研究所. 2007-06-04.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[康楠]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[康楠]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace