Differentially Private Frequent Itemset Mining via Transaction Splitting

Recently, there has been a growing interest in designing differentially private data mining algorithms. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. In this paper, we explore the possibility of designing a differentially private FIM algorithm which can not only achieve high data utility and a high degree of privacy, but also offer high time efficiency. To this end, we propose a differentially private FIM algorithm based on the FP-growth algorithm, which is referred to as PFP-growth. The PFP-growth algorithm consists of a preprocessing phase and a mining phase. In the preprocessing phase, to improve the utility and privacy tradeoff, a novel smart splitting method is proposed to transform the database. For a given database, the preprocessing phase needs to be performed only once. In the mining phase, to offset the information loss caused by transaction splitting, we devise a run-time estimation method to estimate the actual support of itemsets in the original database. In addition, by leveraging the downward closure property, we put forward a dynamic reduction method to dynamically reduce the amount of noise added to guarantee privacy during the mining process. Through formal privacy analysis, we show that our PFP-growth algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFP-growth algorithm substantially outperforms the state-of-the-art techniques.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[3]  Jeffrey F. Naughton,et al.  On differentially private frequent itemset mining , 2012, Proc. VLDB Endow..

[4]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[5]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[6]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[7]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[8]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[9]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[10]  Xiang Cheng,et al.  DP-Apriori: A differentially private frequent itemset mining algorithm based on transaction splitting , 2015, Comput. Secur..

[11]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[12]  Ninghui Li,et al.  PrivBasis: Frequent Itemset Mining with Differential Privacy , 2012, Proc. VLDB Endow..

[13]  Refael Hassin,et al.  Approximation Algorithms for Min-sum p-clustering , 1998, Discret. Appl. Math..

[14]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[15]  Li Xiong,et al.  A two-phase algorithm for mining sequential patterns with differential privacy , 2013, CIKM.

[16]  Claude Castelluccia,et al.  Differentially private sequential data publication via variable-length n-grams , 2012, CCS.

[17]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[18]  Ting Yu,et al.  Mining frequent graph patterns with differential privacy , 2013, KDD.

[19]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[20]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[21]  Xiaofeng Meng,et al.  Differentially Private Set-Valued Data Release against Incremental Updates , 2013, DASFAA.

[22]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[23]  Nikos Mamoulis,et al.  An Audit Environment for Outsourcing of Frequent Itemset Mining , 2009, Proc. VLDB Endow..

[24]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Benjamin C. M. Fung,et al.  Differentially private transit data publication: a case study on the montreal transportation system , 2012, KDD.

[26]  Nikos Mamoulis,et al.  Security in Outsourcing of Association Rule Mining , 2007, VLDB.

[27]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[28]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..