Institutional Repository
| 基于网络链接图的反欺诈技术研究 | |
| Alternative Title | The Research on Spam Detection by Leveraging Web Link Graph |
| 武磊 | |
| 2007-06-04 | |
| Degree Grantor | 中国科学院软件研究所 |
| Degree Level | 博士 |
| Place of Degree Grantor | 软件研究所 |
| Keyword | Spam技术 Spam检测 结构信息 时域信息 |
| English Abstract | 互联网中的网站可以通过Spam技术来提升自己在搜索引擎中的排名,从而为自己带来经济利益。但是与此同时,这些Spam技术却干扰了搜索引擎正常的排名结果,给搜索引擎带来了很大的挑战。为此,人们一直在研究算法来检测Spam技术。 本文针对如何有效检测针对网页重要程度的Spam技术进行研究。我们分析发现目前的算法主要分为基于结构信息和基于时域信息两种。本文针对目前比较流行的Spam技术,从网络链接图中设计并提取了大量的结构信息特征和时域信息特征,用机器学习的方法分别训练了基于结构信息的Spam网站分类器和基于时域信息的Spam网站分类器,并取得了很好的实验结果。 在此基础上,针对结构信息和时域信息各自的优势以及网站的多样性,本文设计并实现了一种将结构信息特征和时域信息特征结合起来检测Spam技术的方法。它根据网络链接图中网站出现情况的不同来选用不同的分类器预测网站性质,以提高预测准确率。该方法在实际的网络应用中取得了很好的效果。最后,本文通过介绍一个基于Spam网站分类器来实现的工具Spam Detector,展现了检测Spam技术所带来的优势。 |
| Abstract | The ranking of websites can be promoted in the retrieval lists of Web search engines by Spamming technologies, and thus it can bring much revenue and interest to the owners of the websites. However, Spamming technologies will disturb the reasonable ranking of websites, so Web Spam has become one of the greatest challenges for commercial search engines. Some research efforts have been made to combat with Web Spam. This thesis focuses on how to efficiently detect Spam technologies originated from page importance. Those previous work can be classified into two categories: one is based on structural information and another is based on temporal information. In this thesis, targeting at detecting some popular Spamming technologies, we extract a group of well-designed structural and temporal features from a series of link graphs, and train a structural information based classifier and a temporal information based classifier separately to distinguish normal websites from Spam websites. Experiment data show that both of the classifiers are quite effective for Web Spam detection. Additionally, because of the diversity among websites, we propose a novel framework to combine structural information and temporal information together for Spam detection. Experiments on a real-world dataset demonstrate that the proposed method performs quite well in Web Spam detection. Finally, we give an example--Spam Detector, which is implemented based on Web Spam classifiers, to show the benefit brought by this new technology. |
| Pages | 71 |
| Language | 中文 |
| Content Type | 学位论文 |
| URI | http://ir.iscas.ac.cn/handle/311060/7568 |
| Collection | 中科院软件所_中科院软件所 |
| Recommended Citation GB/T 7714 | 武磊. 基于网络链接图的反欺诈技术研究[D]. 软件研究所. 中国科学院软件研究所,2007. |
| Files in This Item: | ||||||
| File Name/Size | DocType | Version | Access | License | ||
| 10001_20042801502904(1944KB) | 限制开放 | -- | Application Full Text | |||
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment