Understanding Hierarchical Methods for Differentially Private Histograms

In recent years, many approaches to differentially privately publish histograms have been proposed. Several approaches rely on constructing tree structures in order to decrease the error when answer large range queries. In this paper, we examine the factors affecting the accuracy of hierarchical approaches by studying the mean squared error (MSE) when answering range queries. We start with one-dimensional histograms, and analyze how the MSE changes with different branching factors, after employing constrained inference, and with different methods to allocate the privacy budget among hierarchy levels. Our analysis and experimental results show that combining the choice of a good branching factor with constrained inference outperform the current state of the art. Finally, we extend our analysis to multi-dimensional histograms. We show that the benefits from employing hierarchical methods beyond a single dimension are significantly diminished, and when there are 3 or more dimensions, it is almost always better to use the Flat method instead of a hierarchy.

[1]  Ninghui Li,et al.  Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[2]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[3]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[4]  Yin Yang,et al.  Differentially private histogram publication , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[5]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[6]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[7]  Ninghui Li,et al.  Recursive partitioning and summarization: a practical framework for differentially private data publishing , 2012, ASIACCS '12.

[8]  Gerome Miklau,et al.  An Adaptive Mechanism for Accurate Query Answering under Differential Privacy , 2012, Proc. VLDB Endow..

[9]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[10]  Steven Ruggles,et al.  Integrated Public Use Microdata Series: Version 3 , 2003 .

[11]  Yin Yang,et al.  Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy , 2012, Proc. VLDB Endow..

[12]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[13]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[14]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[15]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[16]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[17]  Jon M. Kleinberg,et al.  Overview of the 2003 KDD Cup , 2003, SKDD.

[18]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[19]  S. Ruggles Integrated Public Use Microdata Series , 2021, Encyclopedia of Gerontology and Population Aging.

[20]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[21]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[22]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[23]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[24]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[25]  Chun Yuan,et al.  Differentially Private Data Release through Multidimensional Partitioning , 2010, Secure Data Management.