PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions

Given a set D of tuples defined on a domain Omega, we study differentially private algorithms for constructing a histogram over Omega to approximate the tuple distribution in D. Existing solutions for the problem mostly adopt a hierarchical decomposition approach, which recursively splits Omega into sub-domains and computes a noisy tuple count for each sub-domain, until all noisy counts are below a certain threshold. This approach, however, requires that we (i) impose a limit h on the recursion depth in the splitting of Omega and (ii) set the noise in each count to be proportional to h. The choice of h is a serious dilemma: a small h makes the resulting histogram too coarse-grained, while a large h leads to excessive noise in the tuple counts used in deciding whether sub-domains should be split. Furthermore, h cannot be directly tuned based on D; otherwise, the choice of h itself reveals private information and violates differential privacy. To remedy the deficiency of existing solutions, we present PrivTree, a histogram construction algorithm that adopts hierarchical decomposition but completely eliminates the dependency on a pre-defined h. The core of PrivTree is a novel mechanism that (i) exploits a new analysis on the Laplace distribution and (ii) enables us to use only a constant amount of noise in deciding whether a sub-domain should be split, without worrying about the recursion depth of splitting. We demonstrate the application of PrivTree in modelling spatial data, and show that it can be extended to handle sequence data (where the decision in sub-domain splitting is not based on tuple counts but a more sophisticated measure). Our experiments on a variety of real datasets show that PrivTree considerably outperforms the states of the art in terms of data utility.

[1]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[2]  Andreas Haeberlen,et al.  DJoin: differentially private join queries over distributed databases , 2012, OSDI 2012.

[3]  Divesh Srivastava,et al.  Private Release of Graph Statistics using Ladder Functions , 2015, SIGMOD Conference.

[4]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[6]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[7]  Ninghui Li,et al.  PrivBasis: Frequent Itemset Mining with Differential Privacy , 2012, Proc. VLDB Endow..

[8]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[9]  Ashwin Machanavajjhala,et al.  On the Privacy Properties of Variants on the Sparse Vector Technique , 2015, ArXiv.

[10]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[11]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[12]  Jeffrey F. Naughton,et al.  On differentially private frequent itemset mining , 2012, Proc. VLDB Endow..

[13]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[14]  Ninghui Li,et al.  Understanding Hierarchical Methods for Differentially Private Histograms , 2013, Proc. VLDB Endow..

[15]  Gerome Miklau,et al.  Optimal error of query sets under the differentially-private matrix mechanism , 2012, ICDT '13.

[16]  Divesh Srivastava,et al.  Accurate and efficient private release of datacubes and contingency tables , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[17]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[18]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[19]  Gerome Miklau,et al.  An Adaptive Mechanism for Accurate Query Answering under Differential Privacy , 2012, Proc. VLDB Endow..

[20]  Elisa Bertino,et al.  Differentially Private K-Means Clustering , 2015, CODASPY.

[21]  B. Barak,et al.  A study of privacy and fairness in sensitive data analysis , 2011 .

[22]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[23]  Chris Clifton,et al.  Top-k frequent itemsets via differentially private FP-trees , 2014, KDD.

[24]  Ninghui Li,et al.  Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[25]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[26]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[27]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[28]  Aaron Roth,et al.  Differentially private combinatorial optimization , 2009, SODA '10.

[29]  Vitaly Shmatikov,et al.  Airavat: Security and Privacy for MapReduce , 2010, NSDI.

[30]  Yue Wang,et al.  A Data- and Workload-Aware Algorithm for Range Queries Under Differential Privacy , 2014, ArXiv.

[31]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[32]  Benjamin C. M. Fung,et al.  Differentially private transit data publication: a case study on the montreal transportation system , 2012, KDD.

[33]  Paul Francis,et al.  Towards Statistical Queries over Distributed Private User Data , 2012, NSDI.

[34]  Gerome Miklau,et al.  Exponential random graph estimation under differential privacy , 2014, KDD.

[35]  Sofya Raskhodnikova,et al.  Private analysis of graph structure , 2011, Proc. VLDB Endow..

[36]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[37]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[38]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[39]  Yin Yang,et al.  Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy , 2012, Proc. VLDB Endow..

[40]  Ran El-Yaniv,et al.  On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..

[41]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[42]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[43]  Shuigeng Zhou,et al.  Recursive mechanism: towards node differential privacy and unrestricted joins , 2013, SIGMOD '13.

[44]  BentleyJon Louis Multidimensional binary search trees used for associative searching , 1975 .

[45]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[46]  Daniel Kifer,et al.  Private Convex Optimization for Empirical Risk Minimization with Applications to High-dimensional Regression , 2012, COLT.

[47]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[48]  Dana Ron,et al.  The power of amnesia: Learning probabilistic automata with variable memory length , 1996, Machine Learning.

[49]  Chun Yuan,et al.  Differentially Private Data Release through Multidimensional Partitioning , 2010, Secure Data Management.

[50]  Yin Yang,et al.  PrivGene: differentially private model fitting using genetic algorithms , 2013, SIGMOD '13.

[51]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[52]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[53]  Claude Castelluccia,et al.  Differentially private sequential data publication via variable-length n-grams , 2012, CCS.

[54]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[55]  Xiaoqian Jiang,et al.  Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach , 2015, CIKM.

[56]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[57]  Jun Zhang,et al.  PrivBayes: private data release via bayesian networks , 2014, SIGMOD Conference.

[58]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[59]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.