PrivBasis: Frequent Itemset Mining with Differential Privacy

The discovery of frequent itemsets can serve valuable economic and research purposes. Releasing discovered frequent itemsets, however, presents privacy challenges. In this paper, we study the problem of how to perform frequent itemset mining on transaction databases while satisfying differential privacy. We propose an approach, called PrivBasis, which leverages a novel notion called basis sets. A θ-basis set has the property that any itemset with frequency higher than θ is a subset of some basis. We introduce algorithms for privately constructing a basis set and then using it to find the most frequent itemsets. Experiments show that our approach greatly outperforms the current state of the art.

[1]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Ashwin Machanavajjhala,et al.  Publishing Search Logs—A Comparative Study of Privacy Guarantees , 2012, IEEE Transactions on Knowledge and Data Engineering.

[4]  Chedy Raïssi,et al.  ρ-uncertainty , 2010, Proc. VLDB Endow..

[5]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[6]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[7]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[8]  Coenraad Bron,et al.  Finding All Cliques of an Undirected Graph (Algorithm 457) , 1973, Commun. ACM.

[9]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[10]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[11]  Philip S. Yu,et al.  Anonymizing transaction databases for publication , 2008, KDD.

[12]  Elisa Bertino,et al.  Private record matching using differential privacy , 2010, EDBT '10.

[13]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[14]  Nina Mishra,et al.  Releasing search queries and clicks privately , 2009, WWW '09.

[15]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[16]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[17]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[18]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[19]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[22]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[23]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[24]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[25]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[26]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[27]  Jeffrey F. Naughton,et al.  Anonymization of Set-Valued Data via Top-Down, Local Generalization , 2009, Proc. VLDB Endow..

[28]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[29]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[30]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[31]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.