ISCAS OpenIR  > 中科院软件所  > 中科院软件所
Blog挖掘和推荐系统的设计和实现
其他题名Design and Implementation of a Mining and Recommendation System for Blog
康楠
2007-06-04
学位授予单位中国科学院软件研究所
学位博士
学位授予地点软件研究所
关键词Blog搜索 Blog挖掘 文本分类 特征选择
摘要随着Web2.0技术的成熟,Blog作为Web2.0的重要应用,以其个性化的信息发布平台、多元化的内容载体等特点,吸引着越来越多的网络用户参与其中。撰写和浏览Blog已经成为网络文化新的流行热点,并直接推动了Blog搜索服务的发展。目前的Blog搜索服务大都是基于对查询关键字的匹配来实现的,缺乏自动提取用户兴趣并进行推荐的能力。本文设计和实现了一个面向Blog的挖掘和推荐系统Blog-digger,该系统采用Blog挖掘技术,能自动识别用户的兴趣,并主动推荐主题相关的Blog。 本文首先对Web2.0技术进行了概要性的介绍,然后,对文本分类技术进行了细致的探讨,并对该技术中所涉及的各种方法的特点及其性能进行了分析,选出更适合针对Blog进行挖掘的方法。本文还对Blog排名问题进行了较为深入的研究:介绍了基于链接的网页排序算法以及它们在Blog排名上的不适用性,又对现有的两种Blog排名算法进行了分析,指出了其存在的局限性。本文提出了一种基于Blog内容特征的新排序方法。该方法利用机器学习领域中的RankBoost算法,得到一个可以量化Blog热门程度的表达式。在nDCG标准测试中,新算法的排序结果比现有的Blog排名算法提高了14.5%。本文详细描述了Blog-digger系统的设计和实现,包括体系结构、服务器组件和客户组件,介绍了系统的工作流程以及实际系统的工作情况。
其他摘要With the maturity of Web2.0 technologies, Blog attracts a growing number of users involved by providing a personalized information publish platform and presenting a wide range of contents. Writing and browsing Blog has become the Internet's new fashion, which promotes the development of Blog search service. However the most up-to-date Blog search services only provide keyword-based matching, which doesn’t provide automatic extraction of user interests and further interest-related Blog recommendation. This thesis describes the design and implementation of an interest mining and recommendation system Blog-digger. The system adopts interest mining technology and can identify the user's interests and recommend interest-related Blog initiatively. This thesis gives a brief introduction of Web2.0 technologies. A detailed elaboration on text classification techniques is given. Moreover, the characteristics and performance of several various methods involved in text classification are analyzed, which shed light on the selection of proper method for interest mining. The thesis also conducted a state-of-the-art study on Blog Ranking problem: it first introduces the classic Page Ranking algorithms for webpage and their poor appropriateness on Blog Ranking, then analysis two existing ranking algorithms for Blog, thereby points out their innate limitations. Finally, a new ranking method based on the statistical data of Blog contents is proposed. Using RankBoost algorithm in machine learning field, the thesis gives a quantitive characterization of the popularity of a Blog. Using nDCG standard benchmark, the new ranking algorithm has improved sorting result by 14.5% compared to the existing Blog ranking algorithm. And then, the thesis describes the design and implementation of Blog-digger in detail, including its architecture, server components and client components, and introduces the procedure flow and running states.
页数77
语种中文
内容类型学位论文
URI标识http://ir.iscas.ac.cn/handle/311060/7588
专题中科院软件所_中科院软件所
推荐引用方式
GB/T 7714
康楠. Blog挖掘和推荐系统的设计和实现[D]. 软件研究所. 中国科学院软件研究所,2007.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
10001_20042801502902(5064KB) 限制开放--请求全文
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[康楠]的文章
百度学术
百度学术中相似的文章
[康楠]的文章
必应学术
必应学术中相似的文章
[康楠]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。