中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Title:
差分隐私下的一种频繁序列模式挖掘方法
Alternative Title: Frequent Sequential Pattern Mining under Differential Privacy
Author: 卢国庆 ; 张啸剑 ; 丁丽萍 ; 李彦峰 ; 廖鑫
Keyword: 频繁序列模式 ; 数据挖掘 ; 差分隐私 ; 隐私保护 ; 前缀树
Source: 计算机研究与发展
Issued Date: 2015
Volume: 52, Issue:12, Pages:2789-2801
Indexed Type: CSCD
Department: 卢国庆, 中国科学院软件研究所, 北京 100190, 中国;丁丽萍, 中国科学院软件研究所, 北京 100190, 中国;李彦峰, 中国科学院软件研究所, 北京 100190, 中国;廖鑫, 中国科学院软件研究所, 北京 100190, 中国;张啸剑, 河南财经政法大学计算机与信息工程学院, 郑州, 河南 450002, 中国;
Abstract: 频繁序列模式挖掘是数据挖掘领域的1个基本问题,然而模式本身及其支持度计数都有可能泄露用户隐私信息.差分隐私(differential privacy, DP)作为一种新出现的隐私保护技术,定义了一个相当严格的攻击模型,通过添加噪音使数据失真达到隐私保护的目的.由于序列数据内在序列性和高维度的特点 ,给差分隐私应用于频繁序列模式挖掘带来了挑战.对此提出了一种基于交互式差分隐私保护框架的频繁序列模式挖掘算法Diff-FSPM(differen tial-privacy frequent sequential pattern mining).该算法利用指数机制获取最优序列长度,并采用一种维规约策略获得原始序列数据集的规约表示,有效降低序列维度的影响;应用前缀树压缩频繁 序列模式,利用拉普拉斯机制产生的噪音扰动频繁模式的真实支持度计数,同时采用闭频繁序列模式和Markov假设,有效分配隐私预算,并利用一致性约束后 置处理,增强输出模式的可用性.理论角度证明算法满足epsilon-差分隐私,实验结果验证算法具有较好的可用性.
English Abstract: Frequent sequential pattern mining is an exploratory problem in the field of data mining. However, directly releasing the discovered frequent patterns and the corresponding true supports may reveal the individuals' privacy. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection by adding noise. Due to the inherent sequentiality and high-dimensionality in sequences, it is challenging to apply differential privacy to frequent sequential pattern mining. To address those problems, this paper presents a differentially private method called Diff-FSPM that uses an interactive way to find frequent patterns. To reduce the impact of dimensionality, Diff-FSPM first employs the exponential mechanism to obtain the optimal sequential length for truncating each sequence in the original dataset. After that, Diff-FSPM relies on a prefix-tree to compress all the frequent patterns, and then utilizes the Laplace mechanism to perturb the true support of the frequent patterns. To efficiently allocate the privacy budget, Diff-FSPM uses the closet frequent pattern and Markov assumption to guide the mining process. Finally, Diff-FSPM adopts a post-processing technique with consistency constraint to boost the accuracy of the returned noisy support counts. We theoretically prove that Diff-FSPM satisfies epsilon-differential privacy, and the experimental results show that it outperforms the existing methods in terms of utility.
Language: 中文
Citation statistics:
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/17390
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:
File Name/ File Size Content Type Version Access License
差分隐私下的一种频繁序列模式挖掘方法.pdf(2396KB)----限制开放 联系获取全文

Recommended Citation:
卢国庆,张啸剑,丁丽萍,等. 差分隐私下的一种频繁序列模式挖掘方法[J]. 计算机研究与发展,2015-01-01,52(12):2789-2801.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[卢国庆]'s Articles
[张啸剑]'s Articles
[丁丽萍]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[卢国庆]‘s Articles
[张啸剑]‘s Articles
[丁丽萍]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace