Differentially Private Frequent Itemset Mining Against Incremental Updates

Differential privacy has recently been applied to frequent itemset mining (FIM). Most existing works focus on promoting result utility while satisfying differential privacy. However, they all focus on “one-shot” release of a static dataset, which do not adequately address the increasing need for up-to-date sensitive information. In this paper, we address the problem of differentially private FIM for dynamic datasets, and propose a scheme against infinite incremental updates which satisfies \(\epsilon \)-differential privacy in any sliding window. To reduce the increasing perturbation error against incremental updates, we design an adaptive budget allocation scheme combining with transactional dataset change. To reduce the high sensitivity of one-shot release, we split long transactions and analyze its information loss. Then we privately compute the approximate number of frequent itemsets. Based on the above results, we design a threshold exponential mechanism to privately release frequent itemsets. Through formal privacy analysis, we show that our scheme satisfies \(\epsilon \)-differential privacy in any sliding window. Extensive experiment results on real-world datasets illustrate that our scheme achieves high utility and efficiency.

[1]  Aleksandar Nikolov,et al.  Private decayed predicate sums on streams , 2013, ICDT '13.

[2]  Li Xiong,et al.  An Adaptive Approach to Real-Time Aggregate Monitoring With Differential Privacy , 2014, IEEE Trans. Knowl. Data Eng..

[3]  Ninghui Li,et al.  Locally Differentially Private Frequent Itemset Mining , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[4]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[5]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[6]  Yin Yang,et al.  PrivSuper: A Superset-First Approach to Frequent Itemset Mining under Differential Privacy , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[7]  Kai Han,et al.  PrivBUD-Wise: Differentially Private Frequent Itemsets Mining in High-Dimensional Databases , 2019, APWeb/WAIM.

[8]  Ashwin Machanavajjhala,et al.  PeGaSus: Data-Adaptive Differentially Private Stream Processing , 2017, CCS.

[9]  Xiaoqian Jiang,et al.  Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach , 2015, CIKM.

[10]  Hongxia Jin,et al.  Private Analysis of Infinite Data Streams via Retroactive Grouping , 2015, CIKM.

[11]  Jeffrey F. Naughton,et al.  On differentially private frequent itemset mining , 2012, Proc. VLDB Endow..

[12]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, ICALP.

[13]  Xiang Cheng,et al.  Differentially Private Frequent Itemset Mining via Transaction Splitting , 2015, IEEE Transactions on Knowledge and Data Engineering.

[14]  Moni Naor,et al.  Pure Differential Privacy for Rectangle Queries via Private Partitions , 2015, ASIACRYPT.

[15]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[16]  Vitaly Shmatikov,et al.  2011 IEEE Symposium on Security and Privacy “You Might Also Like:” Privacy Risks of Collaborative Filtering , 2022 .

[17]  Cynthia Dwork,et al.  Differential privacy in new settings , 2010, SODA '10.

[18]  Ninghui Li,et al.  PrivBasis: Frequent Itemset Mining with Differential Privacy , 2012, Proc. VLDB Endow..

[19]  Badong Chen,et al.  Frequent Itemsets Mining With Differential Privacy Over Large-Scale Data , 2018, IEEE Access.

[20]  Vaidy S. Sunderam,et al.  FAST: differentially private real-time aggregate monitor with filtering and adaptive sampling , 2013, SIGMOD '13.

[21]  Stavros Papadopoulos,et al.  Differentially Private Event Sequences over Infinite Streams , 2014, Proc. VLDB Endow..

[22]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[23]  Masatoshi Yoshikawa,et al.  Quantifying Differential Privacy under Temporal Correlations , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[24]  Carson Kai-Sang Leung,et al.  CanTree: a tree structure for efficient incremental mining of frequent patterns , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[25]  Dong Wang,et al.  Estimating the number of frequent itemsets in a large database , 2009, EDBT '09.

[26]  Josep Domingo-Ferrer,et al.  Big Data Privacy: Challenges to Privacy Principles and Models , 2015, Data Science and Engineering.

[27]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.