Differential Privacy and Machine Learning: a Survey and Review

The objective of machine learning is to extract useful information from data, while privacy is preserved by concealing information. Thus it seems hard to reconcile these competing interests. However, they frequently must be balanced when mining sensitive data. For example, medical research represents an important application where it is necessary both to extract useful information and protect patient privacy. One way to resolve the conflict is to extract general characteristics of whole populations without disclosing the private information of individuals. In this paper, we consider differential privacy, one of the most popular and powerful definitions of privacy. We explore the interplay between machine learning and differential privacy, namely privacy-preserving machine learning algorithms and learning-based data release mechanisms. We also describe some theoretical results that address what can be learned differentially privately and upper bounds of loss functions for differentially private algorithms. Finally, we present some open questions, including how to incorporate public data, how to deal with missing data in private datasets, and whether, as the number of observed samples grows arbitrarily large, differentially private machine learning algorithms can be achieved at no cost to utility as compared to corresponding non-differentially private algorithms.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  J. Czerniak,et al.  Application of rough sets in the presumptive diagnosis of urinary system diseases , 2003 .

[3]  Henk C. A. van Tilborg,et al.  Encyclopedia of Cryptography and Security, 2nd Ed , 2005 .

[4]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[5]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[6]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[7]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[10]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[11]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[13]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[14]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[15]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[16]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[17]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[18]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[19]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[20]  Rebecca N. Wright,et al.  A Differentially Private Graph Estimator , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[21]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[22]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[23]  Frank McSherry Privacy integrated queries , 2009, Commun. ACM.

[24]  Chun Yuan,et al.  Differentially Private Data Release through Multidimensional Partitioning , 2010, Secure Data Management.

[25]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[26]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[27]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[28]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[29]  Graham Cormode,et al.  Personal privacy vs population privacy: learning to attack anonymization , 2011, KDD.

[30]  Yin Yang,et al.  Compressive mechanism: utilizing sparse representation in differential privacy , 2011, WPES.

[31]  Adam D. Smith,et al.  Privacy-preserving statistical estimation with optimal convergence rates , 2011, STOC '11.

[32]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[33]  Ben Y. Zhao,et al.  Sharing graphs using differentially private graph models , 2011, IMC '11.

[34]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[35]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[36]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[37]  Anindya De,et al.  Lower Bounds in Differential Privacy , 2011, TCC.

[38]  Anand D. Sarwate,et al.  Near-optimal Differentially Private Principal Components , 2012, NIPS.

[39]  Darakhshan J. Mir Differentially-private learning and information theory , 2012, EDBT-ICDT '12.

[40]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[41]  Chengfang Fang,et al.  Adaptive Differentially Private Histogram of Low-Dimensional Data , 2012, Privacy Enhancing Technologies.

[42]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[43]  Divesh Srivastava,et al.  Differentially private summaries for sparse data , 2012, ICDT '12.

[44]  Kamalika Chaudhuri,et al.  Convergence Rates for Differentially Private Statistical Estimation , 2012, ICML.

[45]  Staal A. Vinterbo,et al.  Differentially Private Projected Histograms: Construction and Use for Prediction , 2012, ECML/PKDD.

[46]  Basit Shafiq,et al.  Differentially Private Naive Bayes Classification , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[47]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[48]  Xiaofeng Meng,et al.  Differentially Private Set-Valued Data Release against Incremental Updates , 2013, DASFAA.

[49]  Kunal Talwar,et al.  On differentially private low rank approximation , 2013, SODA.

[50]  Ting Yu,et al.  Mining frequent graph patterns with differential privacy , 2013, KDD.

[51]  Charles Elkan,et al.  Differential privacy based on importance weighting , 2013, Machine Learning.

[52]  Aaron Roth,et al.  Beyond worst-case analysis in private singular vector computation , 2012, STOC '13.

[53]  Xiaoqian Jiang,et al.  Differential-Private Data Publishing Through Component Analysis , 2013, Trans. Data Priv..

[54]  Prateek Jain,et al.  Differentially Private Learning with Kernels , 2013, ICML.

[55]  Qian Xiao,et al.  Differentially private network data release via structural inference , 2014, KDD.

[56]  Xiaoqian Jiang,et al.  Differentially private distributed logistic regression using private and public data , 2014, BMC Medical Genomics.