Membership Inference Attacks Against Machine Learning Models

We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained. We focus on the basic membership inference attack: given a data record and black-box access to a model, determine if the record was in the model's training dataset. To perform membership inference against a target model, we make adversarial use of machine learning and train our own inference model to recognize differences in the target model's predictions on the inputs that it trained on versus the inputs that it did not train on. We empirically evaluate our inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon. Using realistic datasets and classification tasks, including a hospital discharge dataset whose membership is sensitive from the privacy perspective, we show that these models can be vulnerable to membership inference attacks. We then investigate the factors that influence this leakage and evaluate mitigation strategies.

[1]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[2]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[3]  Jaideep Vaidya,et al.  Privacy preserving association rule mining in vertically partitioned data , 2002, KDD.

[4]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[5]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[6]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[7]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[8]  Chris Clifton,et al.  Privacy-preserving Naïve Bayes classification , 2008, The VLDB Journal.

[9]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[10]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[11]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[14]  Moni Naor,et al.  On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy , 2010, J. Priv. Confidentiality.

[15]  Ahmad-Reza Sadeghi,et al.  Privacy-Preserving ECG Classification With Branching Programs and Neural Networks , 2011, IEEE Transactions on Information Forensics and Security.

[16]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[17]  Vitaly Shmatikov,et al.  "You Might Also Like:" Privacy Risks of Collaborative Filtering , 2011, 2011 IEEE Symposium on Security and Privacy.

[18]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[19]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[20]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[23]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[24]  Pengtao Xie,et al.  Crypto-Nets: Neural Networks over Encrypted Data , 2014, ArXiv.

[25]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[26]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[27]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[28]  Thomas Steinke,et al.  Robust Traceability from Trace Amounts , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[29]  Roman Garnett,et al.  Differentially Private Bayesian Optimization , 2015, ICML.

[30]  Prateek Jain,et al.  To Drop or Not to Drop: Robustness, Consistency and Differential Privacy Properties of Dropout , 2015, ArXiv.

[31]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[32]  Ilias Diakonikolas,et al.  Differentially Private Learning of Structured Discrete Distributions , 2015, NIPS.

[33]  Giovanni Felici,et al.  Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers , 2013, Int. J. Secur. Networks.

[34]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[35]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[36]  Daqing Zhang,et al.  Participatory Cultural Mapping Based on Collective Behavior Data in Location-Based Social Networks , 2016, ACM Trans. Intell. Syst. Technol..

[37]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[38]  Michael Backes,et al.  Membership Privacy in MicroRNA-based Studies , 2016, CCS.