ISCAS OpenIR  > 中科院软件所  > 中科院软件所
基于网络链接图的反欺诈技术研究
Alternative TitleThe Research on Spam Detection by Leveraging Web Link Graph
武磊
2007-06-04
Degree Grantor中国科学院软件研究所
Degree Level博士
Place of Degree Grantor软件研究所
KeywordSpam技术 Spam检测 结构信息 时域信息
English Abstract互联网中的网站可以通过Spam技术来提升自己在搜索引擎中的排名,从而为自己带来经济利益。但是与此同时,这些Spam技术却干扰了搜索引擎正常的排名结果,给搜索引擎带来了很大的挑战。为此,人们一直在研究算法来检测Spam技术。 本文针对如何有效检测针对网页重要程度的Spam技术进行研究。我们分析发现目前的算法主要分为基于结构信息和基于时域信息两种。本文针对目前比较流行的Spam技术,从网络链接图中设计并提取了大量的结构信息特征和时域信息特征,用机器学习的方法分别训练了基于结构信息的Spam网站分类器和基于时域信息的Spam网站分类器,并取得了很好的实验结果。 在此基础上,针对结构信息和时域信息各自的优势以及网站的多样性,本文设计并实现了一种将结构信息特征和时域信息特征结合起来检测Spam技术的方法。它根据网络链接图中网站出现情况的不同来选用不同的分类器预测网站性质,以提高预测准确率。该方法在实际的网络应用中取得了很好的效果。最后,本文通过介绍一个基于Spam网站分类器来实现的工具Spam Detector,展现了检测Spam技术所带来的优势。
AbstractThe ranking of websites can be promoted in the retrieval lists of Web search engines by Spamming technologies, and thus it can bring much revenue and interest to the owners of the websites. However, Spamming technologies will disturb the reasonable ranking of websites, so Web Spam has become one of the greatest challenges for commercial search engines. Some research efforts have been made to combat with Web Spam. This thesis focuses on how to efficiently detect Spam technologies originated from page importance. Those previous work can be classified into two categories: one is based on structural information and another is based on temporal information. In this thesis, targeting at detecting some popular Spamming technologies, we extract a group of well-designed structural and temporal features from a series of link graphs, and train a structural information based classifier and a temporal information based classifier separately to distinguish normal websites from Spam websites. Experiment data show that both of the classifiers are quite effective for Web Spam detection. Additionally, because of the diversity among websites, we propose a novel framework to combine structural information and temporal information together for Spam detection. Experiments on a real-world dataset demonstrate that the proposed method performs quite well in Web Spam detection. Finally, we give an example--Spam Detector, which is implemented based on Web Spam classifiers, to show the benefit brought by this new technology.
Pages71
Language中文
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/7568
Collection中科院软件所_中科院软件所
Recommended Citation
GB/T 7714
武磊. 基于网络链接图的反欺诈技术研究[D]. 软件研究所. 中国科学院软件研究所,2007.
Files in This Item:
File Name/Size DocType Version Access License
10001_20042801502904(1944KB) 限制开放--Application Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[武磊]'s Articles
Baidu academic
Similar articles in Baidu academic
[武磊]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[武磊]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.