Chasing Accuracy and Privacy, and Catching Both: A Literature Survey on Differentially Private Histogram Publication

Histograms and synthetic data are of key importance in data analysis. However, researchers have shown that even aggregated data such as histograms, containing no obvious sensitive attributes, can result in privacy leakage. To enable data analysis, a strong notion of privacy is required to avoid risking unintended privacy violations. Such a strong notion of privacy is differential privacy, a statistical notion of privacy that makes privacy leakage quantifiable. The caveat regarding differential privacy is that while it has strong guarantees for privacy, privacy comes at a cost of accuracy. Despite this trade off being a central and important issue in the adoption of differential privacy, there exists a gap in the literature for understanding the trade off and addressing it appropriately. Through a systematic literature review (SLR), we investigate the state-of-the-art within accuracy improving differentially private algorithms for histogram and synthetic data publishing. Our contribution is two-fold: 1) we provide an understanding of the problem by crystallizing the categories of accuracy improving techniques, the core problems they solve, as well as to investigate how composable the techniques are, and 2) we pave the way for future work. In order to provide an understanding, we position and visualize the ideas in relation to each other and external work, and deconstruct each algorithm to examine the building blocks separately with the aim of pinpointing which dimension of noise reduction each technique is targeting. Hence, this systematization of knowledge (SoK) provides an understanding of in which dimensions and how accuracy improvement can be pursued without sacrificing privacy.

[1]  Minghua Chen,et al.  Optimal Random Perturbation at Multiple Privacy Levels , 2009, Proc. VLDB Endow..

[2]  Xiaoqian Jiang,et al.  Differentially Private Synthesization of Multi-Dimensional Data using Copula Functions , 2014, EDBT.

[3]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[4]  Li Xiong,et al.  An Adaptive Approach to Real-Time Aggregate Monitoring With Differential Privacy , 2014, IEEE Trans. Knowl. Data Eng..

[5]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[6]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[7]  Shuyu Li,et al.  Research on Differential Private Streaming Histogram Publication Algorithm , 2018, 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS).

[8]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[9]  Ninghui Li,et al.  PriView: practical differentially private release of marginal contingency tables , 2014, SIGMOD Conference.

[10]  Divesh Srivastava,et al.  Differentially private summaries for sparse data , 2012, ICDT '12.

[11]  Hongxia Jin,et al.  Private Analysis of Infinite Data Streams via Retroactive Grouping , 2015, CIKM.

[12]  Kotagiri Ramamohanarao,et al.  Publishing spatial histograms under differential privacy , 2018, SSDBM.

[13]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[14]  Hong Chen,et al.  An Iterative Algorithm for Differentially Private Histogram Publication , 2013, 2013 International Conference on Cloud Computing and Big Data.

[15]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.

[16]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[17]  Claude Castelluccia,et al.  Differentially Private Histogram Publishing through Lossy Compression , 2012, 2012 IEEE 12th International Conference on Data Mining.

[18]  Jianpei Zhang,et al.  Differential Privacy for Edge Weights in Social Networks , 2017, Secur. Commun. Networks.

[19]  Wei Chen,et al.  Publishing Graph Node Strength Histogram with Edge Differential Privacy , 2018, DASFAA.

[20]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[21]  Xin Liu,et al.  Histogram Publishing Method Based on Differential Privacy , 2018, DEStech Transactions on Computer Science and Engineering.

[22]  Stavros Papadopoulos,et al.  Differentially Private Event Sequences over Infinite Streams , 2014, Proc. VLDB Endow..

[23]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[24]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[25]  Gerome Miklau,et al.  Generating private synthetic databases for untrusted system evaluation , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[26]  Ashwin Machanavajjhala,et al.  Pythia: Data Dependent Differentially Private Algorithm Selection , 2017, SIGMOD Conference.

[27]  Raef Bassily,et al.  Local, Private, Efficient Protocols for Succinct Histograms , 2015, STOC.

[28]  Ju Ren,et al.  DPPro: Differentially Private High-Dimensional Data Release via Random Projection , 2017, IEEE Transactions on Information Forensics and Security.

[29]  Sofya Raskhodnikova,et al.  Analyzing Graphs with Node Differential Privacy , 2013, TCC.

[30]  Ninghui Li,et al.  Understanding Hierarchical Methods for Differentially Private Histograms , 2013, Proc. VLDB Endow..

[31]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[32]  Elisa Bertino,et al.  Private record matching using differential privacy , 2010, EDBT '10.

[33]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[34]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[35]  Lars Kulik,et al.  Distributed Histograms for Processing Aggregate Data from Moving Objects , 2007, 2007 International Conference on Mobile Data Management.

[36]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[37]  Xiaohui Liang,et al.  Privacy-Preserving mHealth Data Release with Pattern Consistency , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[38]  Xiaoqian Jiang,et al.  Differentially Private Histogram Publication for Dynamic Datasets: an Adaptive Sampling Approach , 2015, CIKM.

[39]  John M. Abowd,et al.  The U.S. Census Bureau Adopts Differential Privacy , 2018, KDD.

[40]  Fang Liu,et al.  Comparative Study of Differentially Private Data Synthesis Methods , 2016, Statistical Science.

[41]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[42]  Yin Yang,et al.  PrivGene: differentially private model fitting using genetic algorithms , 2013, SIGMOD '13.

[43]  Sharad Mehrotra,et al.  SORTaki: A Framework to Integrate Sorting with Differential Private Histogramming Algorithms , 2017, 2017 15th Annual Conference on Privacy, Security and Trust (PST).

[44]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[45]  Gerome Miklau,et al.  An Adaptive Mechanism for Accurate Query Answering under Differential Privacy , 2012, Proc. VLDB Endow..

[46]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[47]  Françoise Fessant,et al.  Co-clustering for Differentially Private Synthetic Data Generation , 2017, PAP@PKDD/ECML.

[48]  Jianliang Xu,et al.  Towards Accurate Histogram Publication under Differential Privacy , 2014, SDM.

[49]  Haoran Li,et al.  DPCube: Differentially Private Histogram Release through Multidimensional Partitioning , 2012, Trans. Data Priv..

[50]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[51]  Yue Wang,et al.  A Data- and Workload-Aware Algorithm for Range Queries Under Differential Privacy , 2014, ArXiv.

[52]  Yin Yang,et al.  Differentially Private Histogram Publication , 2012, ICDE.

[53]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[54]  Ninghui Li,et al.  Publishing Graph Degree Distribution with Node Differential Privacy , 2016, SIGMOD Conference.

[55]  Xuebin Ma,et al.  Dynamic Data Histogram Publishing Based on Differential Privacy , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[56]  Xiaojiang Du,et al.  Publishing histograms with outliers under data differential privacy , 2016, Secur. Commun. Networks.

[57]  David D. Jensen,et al.  Accurate Estimation of the Degree Distribution of Private Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[58]  Ge Yu,et al.  Differentially Private Event Histogram Publication on Sequences over Graphs , 2017, Journal of Computer Science and Technology.

[59]  Claude Castelluccia,et al.  Differentially private sequential data publication via variable-length n-grams , 2012, CCS.

[60]  Jian Li,et al.  Data generation using declarative constraints , 2011, SIGMOD '11.

[61]  Xiaoqian Jiang,et al.  Differentially Private Histogram and Synthetic Data Publication , 2015, Medical Data Privacy Handbook.

[62]  Joydeep Ghosh,et al.  PeGS: Perturbed Gibbs Samplers that Generate Privacy-Compliant Synthetic Data , 2014, Trans. Data Priv..

[63]  Ninghui Li,et al.  Differentially private grids for geospatial data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[64]  Sofya Raskhodnikova,et al.  Efficient Lipschitz Extensions for High-Dimensional Graph Statistics and Node Private Degree Distributions , 2015, ArXiv.

[65]  Benjamin C. M. Fung,et al.  Differentially private transit data publication: a case study on the montreal transportation system , 2012, KDD.

[66]  Avrim Blum,et al.  Differentially private data analysis of social networks via restricted sensitivity , 2012, ITCS '13.

[67]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[68]  Hai Jin,et al.  Privacy-Preserving Triangle Counting in Large Graphs , 2018, CIKM.

[69]  Xing Xie,et al.  PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions , 2016, SIGMOD Conference.

[70]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[71]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[72]  Jianfeng Ma,et al.  IHP: improving the utility in differential private histogram publication , 2019, Distributed and Parallel Databases.

[73]  Yue Wang,et al.  Maximum Likelihood Postprocessing for Differential Privacy under Consistency Constraints , 2015, KDD.

[74]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[75]  Liusheng Huang,et al.  Private Weighted Histogram Aggregation in Crowdsourcing , 2016, WASA.

[76]  Ashwin Machanavajjhala,et al.  Principled Evaluation of Differentially Private Algorithms using DPBench , 2015, SIGMOD Conference.

[77]  Xike Xie,et al.  A Utility-Optimized Framework for Personalized Private Histogram Estimation , 2019, IEEE Transactions on Knowledge and Data Engineering.

[78]  Elisa Bertino,et al.  Differentially Private K-Means Clustering , 2015, CODASPY.

[79]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[80]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[81]  Stavros Papadopoulos,et al.  Practical Differential Privacy via Grouping and Smoothing , 2013, Proc. VLDB Endow..

[82]  Suman Nath,et al.  Differentially private aggregation of distributed time-series with transformation and encryption , 2010, SIGMOD Conference.

[83]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[84]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[85]  Hui Li,et al.  Different strategies for differentially private histogram publication , 2017, Journal of Communications and Information Networks.

[86]  Xing Zhang,et al.  Differentially private histogram publishing through Fractal dimension for dynamic datasets , 2018, 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[87]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.