Auditing Differentially Private Machine Learning: How Private is Private SGD?

We investigate whether Differentially Private SGD offers better privacy in practice than what is guaranteed by its state-of-the-art analysis. We do so via novel data poisoning attacks, which we show correspond to realistic privacy attacks. While previous work (Ma et al., arXiv 2019) proposed this connection between differential privacy and data poisoning as a defense against data poisoning, our use as a tool for understanding the privacy of a specific mechanism is new. More generally, our work takes a quantitative, empirical approach to understanding the privacy afforded by specific implementations of differentially private algorithms that we believe has the potential to complement and influence analytical work on differential privacy.

[1]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[2]  Aaron Roth,et al.  Gaussian differential privacy , 2019, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[3]  Úlfar Erlingsson,et al.  Prochlo: Strong Privacy for Analytics in the Crowd , 2017, SOSP.

[4]  Shruti Tople,et al.  An Empirical Study on the Intrinsic Privacy of SGD. , 2019 .

[5]  Li Zhang,et al.  Rényi Differential Privacy of the Sampled Gaussian Mechanism , 2019, ArXiv.

[6]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[7]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[8]  Thomas Steinke,et al.  Composable and versatile privacy via truncated CDP , 2018, STOC.

[9]  Shuang Song,et al.  Making the Shoe Fit: Architectures, Initializations, and Tuning for Learning with Privacy , 2019 .

[10]  Guy N. Rothblum,et al.  Concentrated Differential Privacy , 2016, ArXiv.

[11]  Anna C. Gilbert,et al.  Property Testing For Differential Privacy , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[13]  Tudor Dumitras,et al.  On the Effectiveness of Mitigating Data Poisoning Attacks with Gradient Shaping , 2020, ArXiv.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[16]  Thomas Steinke,et al.  Concentrated Differential Privacy: Simplifications, Extensions, and Lower Bounds , 2016, TCC.

[17]  Somesh Jha,et al.  Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting , 2017, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[18]  Calton Pu,et al.  Differentially Private Model Publishing for Deep Learning , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[19]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[20]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[21]  H. Storkel Learning New Words , 2001 .

[22]  Cordelia Schmid,et al.  Radioactive data: tracing through training , 2020, ICML.

[23]  Justin Hsu,et al.  Data Poisoning against Differentially-Private Learners: Attacks and Defenses , 2019, IJCAI.

[24]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[25]  Ilya Mironov,et al.  Rényi Differential Privacy , 2017, 2017 IEEE 30th Computer Security Foundations Symposium (CSF).

[26]  David Evans,et al.  Evaluating Differentially Private Machine Learning in Practice , 2019, USENIX Security Symposium.

[27]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[28]  Shruti Tople,et al.  On the Intrinsic Privacy of Stochastic Gradient Descent , 2019, ArXiv.

[29]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[30]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[31]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[32]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[33]  Úlfar Erlingsson,et al.  RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response , 2014, CCS.

[34]  Danfeng Zhang,et al.  Detecting Violations of Differential Privacy , 2018, CCS.

[35]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[36]  Ashwin Machanavajjhala,et al.  Utility Cost of Formal Privacy for Releasing National Employer-Employee Statistics , 2017, SIGMOD Conference.

[37]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[38]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.