ISCAS OpenIR
差分隐私下的一种频繁序列模式挖掘方法
Alternative TitleFrequent Sequential Pattern Mining under Differential Privacy
卢国庆; 张啸剑; 丁丽萍; 李彦峰; 廖鑫
2015
Source计算机研究与发展
ISSN1000-1239
Volume52Issue:12Pages:2789-2801
English Abstract频繁序列模式挖掘是数据挖掘领域的1个基本问题,然而模式本身及其支持度计数都有可能泄露用户隐私信息.差分隐私(differential privacy, DP)作为一种新出现的隐私保护技术,定义了一个相当严格的攻击模型,通过添加噪音使数据失真达到隐私保护的目的.由于序列数据内在序列性和高维度的特点 ,给差分隐私应用于频繁序列模式挖掘带来了挑战.对此提出了一种基于交互式差分隐私保护框架的频繁序列模式挖掘算法Diff-FSPM(differen tial-privacy frequent sequential pattern mining).该算法利用指数机制获取最优序列长度,并采用一种维规约策略获得原始序列数据集的规约表示,有效降低序列维度的影响;应用前缀树压缩频繁 序列模式,利用拉普拉斯机制产生的噪音扰动频繁模式的真实支持度计数,同时采用闭频繁序列模式和Markov假设,有效分配隐私预算,并利用一致性约束后 置处理,增强输出模式的可用性.理论角度证明算法满足epsilon-差分隐私,实验结果验证算法具有较好的可用性.
Indexed TypeCSCD
AbstractFrequent sequential pattern mining is an exploratory problem in the field of data mining. However, directly releasing the discovered frequent patterns and the corresponding true supports may reveal the individuals' privacy. The state-of-the-art solution for this problem is differential privacy, which offers a strong degree of privacy protection by adding noise. Due to the inherent sequentiality and high-dimensionality in sequences, it is challenging to apply differential privacy to frequent sequential pattern mining. To address those problems, this paper presents a differentially private method called Diff-FSPM that uses an interactive way to find frequent patterns. To reduce the impact of dimensionality, Diff-FSPM first employs the exponential mechanism to obtain the optimal sequential length for truncating each sequence in the original dataset. After that, Diff-FSPM relies on a prefix-tree to compress all the frequent patterns, and then utilizes the Laplace mechanism to perturb the true support of the frequent patterns. To efficiently allocate the privacy budget, Diff-FSPM uses the closet frequent pattern and Markov assumption to guide the mining process. Finally, Diff-FSPM adopts a post-processing technique with consistency constraint to boost the accuracy of the returned noisy support counts. We theoretically prove that Diff-FSPM satisfies epsilon-differential privacy, and the experimental results show that it outperforms the existing methods in terms of utility.
Keyword频繁序列模式 数据挖掘 差分隐私 隐私保护 前缀树
Department卢国庆, 中国科学院软件研究所, 北京 100190, 中国;丁丽萍, 中国科学院软件研究所, 北京 100190, 中国;李彦峰, 中国科学院软件研究所, 北京 100190, 中国;廖鑫, 中国科学院软件研究所, 北京 100190, 中国;张啸剑, 河南财经政法大学计算机与信息工程学院, 郑州, 河南 450002, 中国;
Language中文
CSCD IDCSCD:5591487
Content Type期刊论文
URIhttp://ir.iscas.ac.cn/handle/311060/17390
Collection中国科学院软件研究所
Recommended Citation
GB/T 7714
卢国庆,张啸剑,丁丽萍,等. 差分隐私下的一种频繁序列模式挖掘方法[J]. 计算机研究与发展,2015,52(12):2789-2801.
APA 卢国庆,张啸剑,丁丽萍,李彦峰,&廖鑫.(2015).差分隐私下的一种频繁序列模式挖掘方法.计算机研究与发展,52(12),2789-2801.
MLA 卢国庆,et al."差分隐私下的一种频繁序列模式挖掘方法".计算机研究与发展 52.12(2015):2789-2801.
Files in This Item:
File Name/Size DocType Version Access License
差分隐私下的一种频繁序列模式挖掘方法.p(2396KB) 开放获取LicenseApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[卢国庆]'s Articles
[张啸剑]'s Articles
[丁丽萍]'s Articles
Baidu academic
Similar articles in Baidu academic
[卢国庆]'s Articles
[张啸剑]'s Articles
[丁丽萍]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[卢国庆]'s Articles
[张啸剑]'s Articles
[丁丽萍]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.