A Critical Review on the Use (and Misuse) of Differential Privacy in Machine Learning

We review the use of differential privacy (DP) for privacy protection in machine learning (ML). We show that, driven by the aim of preserving the accuracy of the learned models, DP-based ML implementations are so loose that they do not offer the ex ante privacy guarantees of DP. Instead, what they deliver is basically noise addition similar to the traditional (and often criticized) statistical disclosure control approach. Due to the lack of formal privacy guarantees, the actual level of privacy offered must be experimentally assessed ex post, which is done very seldom. In this respect, we present empirical results showing that standard anti-overfitting techniques in ML can achieve a better utility/privacy/efficiency tradeoff than DP.

[1]  Xuefei Yin,et al.  A Comprehensive Survey of Privacy-preserving Federated Learning , 2021, ACM Comput. Surv..

[2]  Emiliano De Cristofaro,et al.  Local and Central Differential Privacy for Robustness and Privacy in Federated Learning , 2020, NDSS.

[3]  Hangyu Zhu,et al.  Federated Learning on Non-IID Data: A Survey , 2021, Neurocomputing.

[4]  Josep Domingo-Ferrer,et al.  Achieving Security and Privacy in Federated Learning Systems: Survey, Research Challenges and Future Directions , 2020, Eng. Appl. Artif. Intell..

[5]  Josep Domingo-Ferrer,et al.  The limits of differential privacy (and its misuse in data release and machine learning) , 2020, Commun. ACM.

[6]  Úlfar Erlingsson,et al.  Tempered Sigmoid Activations for Deep Learning with Differential Privacy , 2020, AAAI.

[7]  Yu-Xiang Wang,et al.  Voting-based Approaches For Differentially Private Federated Learning , 2020, arXiv.org.

[8]  Wenqi Wei,et al.  LDP-Fed: federated learning with local differential privacy , 2020, EdgeSys@EuroSys.

[9]  Rui Hu,et al.  DP-ADMM: ADMM-Based Distributed Learning With Differential Privacy , 2018, IEEE Transactions on Information Forensics and Security.

[10]  B. Faltings,et al.  Federated Learning with Bayesian Differential Privacy , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[11]  Deirdre K. Mulligan,et al.  Differential Privacy in Practice: Expose your Epsilons! , 2019, J. Priv. Confidentiality.

[12]  Swaroop Ramaswamy,et al.  Federated Learning for Emoji Prediction in a Mobile Keyboard , 2019, ArXiv.

[13]  David Evans,et al.  Evaluating Differentially Private Machine Learning in Practice , 2019, USENIX Security Symposium.

[14]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[15]  Maoguo Gong,et al.  A Survey on Differentially Private Machine Learning , 2019 .

[16]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[17]  Moti Yung,et al.  Differentially-Private "Draw and Discard" Machine Learning , 2018, ArXiv.

[18]  Wei Shi,et al.  Federated learning of predictive models from federated Electronic Health Records , 2018, Int. J. Medical Informatics.

[19]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[20]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[21]  Somesh Jha,et al.  Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , 2017, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[22]  Robert Laganière,et al.  Membership Inference Attack against Differentially Private Deep Learning Model , 2018, Trans. Data Priv..

[23]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[24]  Janardhan Kulkarni,et al.  Collecting Telemetry Data Privately , 2017, NIPS.

[25]  Giuseppe Ateniese,et al.  Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning , 2017, CCS.

[26]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[27]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[28]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[29]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[30]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[31]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[32]  J. Domingo-Ferrer,et al.  Database Anonymization , 2016 .

[33]  Josep Domingo-Ferrer,et al.  Database Anonymization: Privacy Models, Data Utility, and Microaggregation-based Inter-model Connections , 2016, Database Anonymization.

[34]  Xintao Wu,et al.  Using Randomized Response for Differential Privacy Preserving Data Collection , 2016, EDBT/ICDT Workshops.

[35]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[36]  J. Domingo-Ferrer,et al.  Enhancing data utility in differential privacy via microaggregation-based $$k$$k-anonymity , 2014, The VLDB Journal.

[37]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[38]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[39]  Chris Clifton,et al.  On syntactic anonymity and differential privacy , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[40]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[41]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[42]  C. Dwork A firm foundation for private data analysis , 2011, Commun. ACM.

[43]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[44]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[45]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[46]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[47]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[48]  P. Tendick Optimal noise addition for preserving confidentiality in multivariate data , 1991 .

[49]  Thomas H. Parker,et al.  What is π , 1991 .

[50]  G. Paass Disclosure Risk and Disclosure Avoidance for Microdata , 1988 .

[51]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[52]  Dr B Santhosh Kumar Santhosh Balan,et al.  Closeness : A New Privacy Measure for Data Publishing , 2022 .