Towards Accurate Histogram Publication under Differential Privacy

Histograms are the workhorse of data mining and analysis. This paper considers the problem of publishing histograms under differential privacy, one of the strongest privacy models. Existing differentially private histogram publication schemes have shown that clustering (or grouping) is a promising idea to improve the accuracy of sanitized histograms. However, none of them fully exploits the benefit of clustering. In this paper, we introduce a new clustering framework. It features a sophisticated evaluation of the trade-off between the approximation error due to clustering and the Laplace error due to Laplace noise injected, which is normally overlooked in prior work. In particular, we propose three clustering strategies with different orders of run-time complexities. We prove the superiority of our approach by theoretical utility comparisons with the competitors. Our extensive experiments over various standard real-life and synthetic datasets confirm that our technique consistently outperforms existing competitors.

[1]  David P. Doane,et al.  Aesthetic Frequency Classifications , 1976 .

[2]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[3]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[4]  Stavros Papadopoulos,et al.  Practical Differential Privacy via Grouping and Smoothing , 2013, Proc. VLDB Endow..

[5]  Yin Yang,et al.  Differentially Private Histogram Publication , 2012, ICDE.

[6]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[7]  Herbert A. Sturges,et al.  The Choice of a Class Interval , 1926 .

[8]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[9]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[10]  Claude Castelluccia,et al.  Differentially Private Histogram Publishing through Lossy Compression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[11]  Yin Yang,et al.  Differentially private histogram publication , 2012, The VLDB Journal.

[12]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[13]  Yin Yang,et al.  Low-Rank Mechanism: Optimizing Batch Queries under Differential Privacy , 2012, Proc. VLDB Endow..

[14]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[15]  Taylor Francis Online,et al.  The American statistician , 1947 .

[16]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[17]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).