中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
基于网络链接图的反欺诈技术研究
作者: 武磊
答辩日期: 2007-06-04
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: Spam技术 ; Spam检测 ; 结构信息 ; 时域信息
其他题名: The Research on Spam Detection by Leveraging Web Link Graph
摘要: 互联网中的网站可以通过Spam技术来提升自己在搜索引擎中的排名,从而为自己带来经济利益。但是与此同时,这些Spam技术却干扰了搜索引擎正常的排名结果,给搜索引擎带来了很大的挑战。为此,人们一直在研究算法来检测Spam技术。 本文针对如何有效检测针对网页重要程度的Spam技术进行研究。我们分析发现目前的算法主要分为基于结构信息和基于时域信息两种。本文针对目前比较流行的Spam技术,从网络链接图中设计并提取了大量的结构信息特征和时域信息特征,用机器学习的方法分别训练了基于结构信息的Spam网站分类器和基于时域信息的Spam网站分类器,并取得了很好的实验结果。 在此基础上,针对结构信息和时域信息各自的优势以及网站的多样性,本文设计并实现了一种将结构信息特征和时域信息特征结合起来检测Spam技术的方法。它根据网络链接图中网站出现情况的不同来选用不同的分类器预测网站性质,以提高预测准确率。该方法在实际的网络应用中取得了很好的效果。最后,本文通过介绍一个基于Spam网站分类器来实现的工具Spam Detector,展现了检测Spam技术所带来的优势。
英文摘要: The ranking of websites can be promoted in the retrieval lists of Web search engines by Spamming technologies, and thus it can bring much revenue and interest to the owners of the websites. However, Spamming technologies will disturb the reasonable ranking of websites, so Web Spam has become one of the greatest challenges for commercial search engines. Some research efforts have been made to combat with Web Spam. This thesis focuses on how to efficiently detect Spam technologies originated from page importance. Those previous work can be classified into two categories: one is based on structural information and another is based on temporal information. In this thesis, targeting at detecting some popular Spamming technologies, we extract a group of well-designed structural and temporal features from a series of link graphs, and train a structural information based classifier and a temporal information based classifier separately to distinguish normal websites from Spam websites. Experiment data show that both of the classifiers are quite effective for Web Spam detection. Additionally, because of the diversity among websites, we propose a novel framework to combine structural information and temporal information together for Spam detection. Experiments on a real-world dataset demonstrate that the proposed method performs quite well in Web Spam detection. Finally, we give an example--Spam Detector, which is implemented based on Web Spam classifiers, to show the benefit brought by this new technology.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/7568
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200428015029041武磊_paper.doc(1944KB)----限制开放-- 联系获取全文

Recommended Citation:
武磊. 基于网络链接图的反欺诈技术研究[D]. 软件研究所. 中国科学院软件研究所. 2007-06-04.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[武磊]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[武磊]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace