Publishing histograms with outliers under data differential privacy

Histograms are important tools for data mining and analysis. Several differentially private publishing schemes for histograms have been proposed recently. Existing differentially private histogram publication schemes have shown that histogram reconstruction is a promising idea for the improvement of publication histograms' accuracy. However, none of these have properly considered the problem outliers in the original histogram, which can cause significant reconstruction errors. Based on the problem, the publication of histogram outliers under differential privacy, this paper puts forward a publication method for histograms with outliers under differential privacy: Outlier-HistoPub. Our method deals with the count sequence of the original histogram first, using a "global sort" to reduce the degree of alternative distribution a concept proposed in this paper, which may eliminate the influence of outliers during reconstruction. To avoid individual privacy leakage in the reconstruction process, an exponential mechanism is used to select the most similar adjacent bins of the uniformity distribution histogram to merge each time, and the Laplace mechanism is utilized to generate noisy data to perturb the count sequence of the reconstruction histogram. Experiments prove that the method proposed in this paper can improve the efficiency and accuracy of histogram publication. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Zhou Shui Privacy Preservation in Database Applications:A Survey , 2009 .

[3]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[4]  Feng Li,et al.  Privacy Preservation in Database Applications: A Survey: Privacy Preservation in Database Applications: A Survey , 2009 .

[5]  Yin Yang,et al.  Differentially private histogram publication , 2012, The VLDB Journal.

[6]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[8]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[9]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[10]  Xiaoqian Jiang,et al.  Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach , 2015, CIKM.

[11]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[12]  Yin Yang,et al.  Differential privacy in data publication and analysis , 2012, SIGMOD Conference.

[13]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[14]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[15]  Stavros Papadopoulos,et al.  Practical Differential Privacy via Grouping and Smoothing , 2013, Proc. VLDB Endow..

[16]  Claude Castelluccia,et al.  Differentially Private Histogram Publishing through Lossy Compression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[17]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.