Maximum Likelihood Postprocessing for Differential Privacy under Consistency Constraints

When analyzing data that has been perturbed for privacy reasons, one is often concerned about its usefulness. Recent research on differential privacy has shown that the accuracy of many data queries can be improved by post-processing the perturbed data to ensure consistency constraints that are known to hold for the original data. Most prior work converted this post-processing step into a least squares minimization problem with customized efficient solutions. While improving accuracy, this approach ignored the noise distribution in the perturbed data. In this paper, to further improve accuracy, we formulate this post-processing step as a constrained maximum likelihood estimation problem, which is equivalent to constrained L1 minimization. Instead of relying on slow linear program solvers, we present a faster generic recipe (based on ADMM) that is suitable for a wide variety of applications including differentially private contingency tables, histograms, and the matrix mechanism (linear queries). An added benefit of our formulation is that it can often take direct advantage of algorithmic tricks used by the prior work on least-squares post-processing. An extensive set of experiments on various datasets demonstrates that this approach significantly improve accuracy over prior work.

[1]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[2]  Bing-Rong Lin,et al.  Information preservation in statistical privacy and bayesian estimation of unattributed histograms , 2013, SIGMOD '13.

[3]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[4]  Stephen E. Fienberg,et al.  Differential Privacy and the Risk-Utility Tradeoff for Multi-dimensional Contingency Tables , 2010, Privacy in Statistical Databases.

[5]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[6]  Chris Clifton,et al.  Top-k frequent itemsets via differentially private FP-trees , 2014, KDD.

[7]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[8]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[9]  Claude Castelluccia,et al.  Differentially Private Histogram Publishing through Lossy Compression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[10]  Ninghui Li,et al.  Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[12]  Frank McSherry,et al.  Probabilistic Inference and Differential Privacy , 2010, NIPS.

[13]  Ninghui Li,et al.  PriView: practical differentially private release of marginal contingency tables , 2014, SIGMOD Conference.

[14]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[15]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[16]  Yin Yang,et al.  Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy , 2012, Proc. VLDB Endow..

[17]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[18]  Ninghui Li,et al.  Understanding Hierarchical Methods for Differentially Private Histograms , 2013, Proc. VLDB Endow..

[19]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[20]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[21]  Sharon Goldberg,et al.  Calibrating Data to Sensitivity in Private Data Analysis , 2012, Proc. VLDB Endow..

[22]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[23]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.