Differentially Private Data Publishing and Analysis: A Survey

Differential privacy is an essential and prevalent privacy model that has been widely explored in recent decades. This survey provides a comprehensive and structured overview of two research directions: differentially private data publishing and differentially private data analysis. We compare the diverse release mechanisms of differentially private data publishing given a variety of input data in terms of query type, the maximum number of queries, efficiency, and accuracy. We identify two basic frameworks for differentially private data analysis and list the typical algorithms used within each framework. The results are compared and discussed based on output accuracy and efficiency. Further, we propose several possible directions for future research and possible applications.

[1]  Philip S. Yu,et al.  Orthogonal mechanism for answering batch queries with differential privacy , 2015, SSDBM.

[2]  Divesh Srivastava,et al.  DPT: Differentially Private Trajectory Synthesis Using Hierarchical Reference Systems , 2015, Proc. VLDB Endow..

[3]  Philip S. Yu,et al.  Mobile Systems Privacy: 'MobiPriv' A Robust System for Snapshot or Continuous Querying Location Based Mobile Systems , 2012, Trans. Data Priv..

[4]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[5]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[6]  Amos Beimel,et al.  Characterizing the sample complexity of private learners , 2013, ITCS '13.

[7]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[8]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[9]  Svetha Venkatesh,et al.  Differentially Private Random Forest with High Utility , 2015, 2015 IEEE International Conference on Data Mining.

[10]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[11]  Johannes Gehrke,et al.  iReduct: differential privacy with reduced relative errors , 2011, SIGMOD '11.

[12]  Jeffrey F. Naughton,et al.  On differentially private frequent itemset mining , 2012, Proc. VLDB Endow..

[13]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[14]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[15]  Xiang Cheng,et al.  Differentially private frequent subgraph mining , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[16]  Chi-Yin Chow,et al.  Differentially Private Location Recommendations in Geosocial Networks , 2014, 2014 IEEE 15th International Conference on Mobile Data Management.

[17]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[18]  Moshe Tennenholtz,et al.  Approximately optimal mechanism design via differential privacy , 2010, ITCS '12.

[19]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[20]  Amos Beimel,et al.  Learning Privately with Labeled and Unlabeled Examples , 2014, Algorithmica.

[21]  Aaron Roth,et al.  Privacy and Truthful Equilibrium Selection for Aggregative Games , 2014, WINE.

[22]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[23]  Sofya Raskhodnikova,et al.  Private analysis of graph structure , 2011, Proc. VLDB Endow..

[24]  Moni Naor,et al.  Pan-Private Streaming Algorithms , 2010, ICS.

[25]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[26]  Yin Yang,et al.  Collecting and Analyzing Data from Smart Device Users with Local Differential Privacy , 2016, ArXiv.

[27]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[28]  Tianqing Zhu,et al.  An effective privacy preserving algorithm for neighborhood-based collaborative filtering , 2014, Future Gener. Comput. Syst..

[29]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[30]  Yin Yang,et al.  Differentially Private Histogram Publication , 2012, ICDE.

[31]  Ting Yu,et al.  Mining frequent graph patterns with differential privacy , 2013, KDD.

[32]  Larry A. Wasserman,et al.  Differential privacy for functions and functional data , 2012, J. Mach. Learn. Res..

[33]  Raef Bassily,et al.  More General Queries and Less Generalization Error in Adaptive Data Analysis , 2015, ArXiv.

[34]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[35]  Yin Yang,et al.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy , 2016, CCS.

[36]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[37]  Amos Beimel,et al.  Bounds on the Sample Complexity for Private Learning and Private Data Release , 2010, TCC.

[38]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[39]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[40]  Divesh Srivastava,et al.  Differentially Private Spatial Decompositions , 2011, 2012 IEEE 28th International Conference on Data Engineering.

[41]  Stephen E. Fienberg,et al.  Learning with Differential Privacy: Stability, Learnability and the Sufficiency and Necessity of ERM Principle , 2015, J. Mach. Learn. Res..

[42]  Philip S. Yu,et al.  Correlated network data publication via differential privacy , 2013, The VLDB Journal.

[43]  Kamalika Chaudhuri,et al.  Sample Complexity Bounds for Differentially Private Learning , 2011, COLT.

[44]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[45]  Thomas Steinke,et al.  Between Pure and Approximate Differential Privacy , 2015, J. Priv. Confidentiality.

[46]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[47]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[48]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, ICALP.

[49]  Mohamed Ali Kâafar,et al.  A differential privacy framework for matrix factorization recommender systems , 2016, User Modeling and User-Adapted Interaction.

[50]  Tianqing Zhu,et al.  Correlated Differential Privacy: Hiding Information in Non-IID Data Set , 2015, IEEE Transactions on Information Forensics and Security.

[51]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[52]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[53]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[54]  Aaron Roth,et al.  Exploiting Metric Structure for Efficient Private Query Release , 2014, SODA.

[55]  Ninghui Li,et al.  Understanding Hierarchical Methods for Differentially Private Histograms , 2013, Proc. VLDB Endow..

[56]  Divesh Srivastava,et al.  Private Release of Graph Statistics using Ladder Functions , 2015, SIGMOD Conference.

[57]  Sofya Raskhodnikova,et al.  Analyzing Graphs with Node Differential Privacy , 2013, TCC.

[58]  Jonathan Ullman,et al.  Private Multiplicative Weights Beyond Linear Queries , 2014, PODS.

[59]  Philip S. Yu,et al.  $\textsf{LoPub}$ : High-Dimensional Crowdsourced Data Publication With Local Differential Privacy , 2016, IEEE Transactions on Information Forensics and Security.

[60]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[61]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[62]  Prateek Jain,et al.  Differentially Private Learning with Kernels , 2013, ICML.

[63]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[64]  Hongxia Jin,et al.  Efficient Private Empirical Risk Minimization for High-dimensional Learning , 2016, ICML.

[65]  Chris Clifton,et al.  Top-k frequent itemsets via differentially private FP-trees , 2014, KDD.

[66]  Toniann Pitassi,et al.  Generalization in Adaptive Data Analysis and Holdout Reuse , 2015, NIPS.

[67]  Chunming Qiao,et al.  Mutual Information Optimally Local Private Discrete Distribution Estimation , 2016, ArXiv.

[68]  Philip S. Yu,et al.  Reconstruction Privacy: Enabling Statistical Learning , 2015, EDBT.

[69]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[70]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[71]  Yin Yang,et al.  PrivGene: differentially private model fitting using genetic algorithms , 2013, SIGMOD '13.

[72]  Guy N. Rothblum,et al.  Concentrated Differential Privacy , 2016, ArXiv.

[73]  Avrim Blum,et al.  Differentially private data analysis of social networks via restricted sensitivity , 2012, ITCS '13.

[74]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[75]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[76]  Aarti Singh,et al.  Differentially private subspace clustering , 2015, NIPS.

[77]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[78]  Yue Wang,et al.  A Data- and Workload-Aware Query Answering Algorithm for Range Queries Under Differential Privacy , 2014, Proc. VLDB Endow..

[79]  Salil P. Vadhan,et al.  The Complexity of Computing the Optimal Composition of Differential Privacy , 2015, IACR Cryptol. ePrint Arch..

[80]  Elaine Shi,et al.  GUPT: privacy preserving data analysis made easy , 2012, SIGMOD Conference.

[81]  Ninghui Li,et al.  PrivBasis: Frequent Itemset Mining with Differential Privacy , 2012, Proc. VLDB Endow..

[82]  Yue Wang,et al.  Maximum Likelihood Postprocessing for Differential Privacy under Consistency Constraints , 2015, KDD.

[83]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[84]  Aarti Singh,et al.  A Theoretical Analysis of Noisy Sparse Subspace Clustering on Dimensionality-Reduced Data , 2019, IEEE Transactions on Information Theory.

[85]  Hiroshi Nakagawa,et al.  Bayesian Differential Privacy on Correlated Data , 2015, SIGMOD Conference.

[86]  Yin Yang,et al.  Convex Optimization for Linear Query Processing under Approximate Differential Privacy , 2016, KDD.

[87]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[88]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[89]  Andrew McGregor,et al.  The matrix mechanism: optimizing linear counting queries under differential privacy , 2015, The VLDB Journal.

[90]  Xing Xie,et al.  PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions , 2016, SIGMOD Conference.

[91]  Vitaly Shmatikov,et al.  2011 IEEE Symposium on Security and Privacy “You Might Also Like:” Privacy Risks of Collaborative Filtering , 2022 .

[92]  Zhicong Huang,et al.  Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies , 2015, CCS.

[93]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[94]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[95]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[96]  Carl A. Gunter,et al.  Privacy in the Genomic Era , 2014, ACM Comput. Surv..

[97]  Úlfar Erlingsson,et al.  Building a RAPPOR with the Unknown: Privacy-Preserving Learning of Associations and Data Dictionaries , 2015, Proc. Priv. Enhancing Technol..

[98]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[99]  Shuigeng Zhou,et al.  Recursive mechanism: towards node differential privacy and unrestricted joins , 2013, SIGMOD '13.

[100]  Stavros Papadopoulos,et al.  Practical Differential Privacy via Grouping and Smoothing , 2013, Proc. VLDB Endow..

[101]  Yin Yang,et al.  26 F eb 2 01 5 A Optimizing Batch Linear Queries under Exact and Approximate Differential Privacy , 2018 .

[102]  Ashwin Machanavajjhala,et al.  Pufferfish , 2014, ACM Trans. Database Syst..

[103]  Hongxia Jin,et al.  Private spatial data aggregation in the local setting , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[104]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[105]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[106]  Xiao Liu,et al.  When Differential Privacy Meets Randomized Perturbation: A Hybrid Approach for Privacy-Preserving Recommender System , 2017, DASFAA.

[107]  Dejing Dou,et al.  Differential Privacy Preservation for Deep Auto-Encoders: an Application of Human Behavior Prediction , 2016, AAAI.

[108]  Xintao Wu,et al.  Using Randomized Response for Differential Privacy Preserving Data Collection , 2016, EDBT/ICDT Workshops.

[109]  Benjamin C. M. Fung,et al.  Differentially private transit data publication: a case study on the montreal transportation system , 2012, KDD.

[110]  Hongxia Jin,et al.  Private Incremental Regression , 2017, PODS.

[111]  Gilles Barthe,et al.  Higher-Order Approximate Relational Refinement Types for Mechanism Design and Differential Privacy , 2014, POPL.

[112]  Bing-Rong Lin,et al.  Information preservation in statistical privacy and bayesian estimation of unattributed histograms , 2013, SIGMOD '13.

[113]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[114]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[115]  Cyrus Shahabi,et al.  A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing , 2014, Proc. VLDB Endow..

[116]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[117]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[118]  Kobbi Nissim,et al.  On the Generalization Properties of Differential Privacy , 2015, ArXiv.

[119]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[120]  Stavros Papadopoulos,et al.  Differentially Private Event Sequences over Infinite Streams , 2014, Proc. VLDB Endow..

[121]  Cynthia Dwork,et al.  Differential privacy in new settings , 2010, SODA '10.

[122]  Prateek Jain,et al.  (Near) Dimension Independent Risk Bounds for Differentially Private Learning , 2014, ICML.