ISCAS OpenIR  > 中科院软件所  > 中科院软件所
Blog挖掘和推荐系统的设计和实现
Alternative TitleDesign and Implementation of a Mining and Recommendation System for Blog
康楠
2007-06-04
Degree Grantor中国科学院软件研究所
Degree Level博士
Place of Degree Grantor软件研究所
KeywordBlog搜索 Blog挖掘 文本分类 特征选择
English Abstract随着Web2.0技术的成熟,Blog作为Web2.0的重要应用,以其个性化的信息发布平台、多元化的内容载体等特点,吸引着越来越多的网络用户参与其中。撰写和浏览Blog已经成为网络文化新的流行热点,并直接推动了Blog搜索服务的发展。目前的Blog搜索服务大都是基于对查询关键字的匹配来实现的,缺乏自动提取用户兴趣并进行推荐的能力。本文设计和实现了一个面向Blog的挖掘和推荐系统Blog-digger,该系统采用Blog挖掘技术,能自动识别用户的兴趣,并主动推荐主题相关的Blog。 本文首先对Web2.0技术进行了概要性的介绍,然后,对文本分类技术进行了细致的探讨,并对该技术中所涉及的各种方法的特点及其性能进行了分析,选出更适合针对Blog进行挖掘的方法。本文还对Blog排名问题进行了较为深入的研究:介绍了基于链接的网页排序算法以及它们在Blog排名上的不适用性,又对现有的两种Blog排名算法进行了分析,指出了其存在的局限性。本文提出了一种基于Blog内容特征的新排序方法。该方法利用机器学习领域中的RankBoost算法,得到一个可以量化Blog热门程度的表达式。在nDCG标准测试中,新算法的排序结果比现有的Blog排名算法提高了14.5%。本文详细描述了Blog-digger系统的设计和实现,包括体系结构、服务器组件和客户组件,介绍了系统的工作流程以及实际系统的工作情况。
AbstractWith the maturity of Web2.0 technologies, Blog attracts a growing number of users involved by providing a personalized information publish platform and presenting a wide range of contents. Writing and browsing Blog has become the Internet's new fashion, which promotes the development of Blog search service. However the most up-to-date Blog search services only provide keyword-based matching, which doesn’t provide automatic extraction of user interests and further interest-related Blog recommendation. This thesis describes the design and implementation of an interest mining and recommendation system Blog-digger. The system adopts interest mining technology and can identify the user's interests and recommend interest-related Blog initiatively. This thesis gives a brief introduction of Web2.0 technologies. A detailed elaboration on text classification techniques is given. Moreover, the characteristics and performance of several various methods involved in text classification are analyzed, which shed light on the selection of proper method for interest mining. The thesis also conducted a state-of-the-art study on Blog Ranking problem: it first introduces the classic Page Ranking algorithms for webpage and their poor appropriateness on Blog Ranking, then analysis two existing ranking algorithms for Blog, thereby points out their innate limitations. Finally, a new ranking method based on the statistical data of Blog contents is proposed. Using RankBoost algorithm in machine learning field, the thesis gives a quantitive characterization of the popularity of a Blog. Using nDCG standard benchmark, the new ranking algorithm has improved sorting result by 14.5% compared to the existing Blog ranking algorithm. And then, the thesis describes the design and implementation of Blog-digger in detail, including its architecture, server components and client components, and introduces the procedure flow and running states.
Pages77
Language中文
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/7588
Collection中科院软件所_中科院软件所
Recommended Citation
GB/T 7714
康楠. Blog挖掘和推荐系统的设计和实现[D]. 软件研究所. 中国科学院软件研究所,2007.
Files in This Item:
File Name/Size DocType Version Access License
10001_20042801502902(5064KB) 限制开放--Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[康楠]'s Articles
Baidu academic
Similar articles in Baidu academic
[康楠]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[康楠]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.