Differential Privacy Protection Against Membership Inference Attack on Machine Learning for Genomic Data

Machine learning is powerful to model massive genomic data while genome privacy is a growing concern. Studies have shown that not only the raw data but also the trained model can potentially infringe genome privacy. An example is the membership inference attack (MIA), by which the adversary, who only queries a given target model without knowing its internal parameters, can determine whether a specific record was included in the training dataset of the target model. Differential privacy (DP) has been used to defend against MIA with rigorous privacy guarantee. In this paper, we investigate the vulnerability of machine learning against MIA on genomic data, and evaluate the effectiveness of using DP as a defense mechanism. We consider two widely-used machine learning models, namely Lasso and convolutional neural network (CNN), as the target model. We study the trade-off between the defense power against MIA and the prediction accuracy of the target model under various privacy settings of DP. Our results show that the relationship between the privacy budget and target model accuracy can be modeled as a log-like curve, thus a smaller privacy budget provides stronger privacy guarantee with the cost of losing more model accuracy. We also investigate the effect of model sparsity on model vulnerability against MIA. Our results demonstrate that in addition to prevent overfitting, model sparsity can work together with DP to significantly mitigate the risk of MIA.

[1]  Cynthia Dwork,et al.  Differential Privacy , 2006, Encyclopedia of Cryptography and Security.

[2]  Graham Coop,et al.  Attacks on genetic privacy via uploads to genealogical databases , 2019, bioRxiv.

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security , 2015, CCS.

[5]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[6]  Xintao Wu,et al.  An overview of human genetic privacy , 2017, Annals of the New York Academy of Sciences.

[7]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[8]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[9]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[10]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[11]  Melissa Haendel,et al.  ClinGen advancing genomic data‐sharing standards as a GA4GH driver project , 2018, Human mutation.

[12]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[13]  Xintao Wu,et al.  Regression Model Fitting under Differential Privacy and Model Inversion Attack , 2015, IJCAI.

[14]  Xinghua Shi,et al.  Sparse Convolutional Denoising Autoencoders for Genotype Imputation , 2019, Genes.

[15]  Junjie Chen,et al.  Statistical and Machine Learning Methods for eQTL Analysis. , 2019, Methods in molecular biology.

[16]  Pierre Fontanillas,et al.  Genome-wide association study of delay discounting in 23,217 adult research participants of European ancestry , 2017, Nature Neuroscience.

[17]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[18]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[19]  Qiang Yang,et al.  Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso , 2010, BMC Bioinformatics.

[20]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[21]  R. Ness Influence of the HIPAA Privacy Rule on health research. , 2007, JAMA.

[22]  Liwei Song,et al.  Membership Inference Attacks Against Adversarially Robust Deep Learning Models , 2019, 2019 IEEE Security and Privacy Workshops (SPW).

[23]  Kai Chen,et al.  Understanding Membership Inferences on Well-Generalized Learning Models , 2018, ArXiv.

[24]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[25]  Xinghua Shi,et al.  A Sparse Convolutional Predictor with Denoising Autoencoders for Phenotype Prediction , 2019, BCB.

[26]  Daniel Bernau,et al.  Monte Carlo and Reconstruction Membership Inference Attacks against Generative Models , 2019, Proc. Priv. Enhancing Technol..

[27]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[28]  Mario Fritz,et al.  GAN-Leaks: A Taxonomy of Membership Inference Attacks against GANs , 2019, ArXiv.

[29]  Jie Xu,et al.  Federated Learning for Healthcare Informatics , 2019, ArXiv.

[30]  Michael Backes,et al.  MemGuard: Defending against Black-Box Membership Inference Attacks via Adversarial Examples , 2019, CCS.

[31]  Abhijit Patil,et al.  Differential private random forest , 2014, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[32]  Emiliano De Cristofaro,et al.  : Membership Inference Attacks Against Generative Models , 2018 .

[33]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[34]  Emiliano De Cristofaro,et al.  LOGAN: Membership Inference Attacks Against Generative Models , 2017, Proc. Priv. Enhancing Technol..

[35]  Saharon Rosset,et al.  Leakage in data mining: formulation, detection, and avoidance , 2011, TKDD.

[36]  Wenqi Wei,et al.  Demystifying Membership Inference Attacks in Machine Learning as a Service , 2019, IEEE Transactions on Services Computing.

[37]  Luis Ceze,et al.  Genotype Extraction and False Relative Attacks: Security Risks to Third-Party Genetic Genealogy Services Beyond Identity Inference , 2020, NDSS.

[38]  David J. Wu,et al.  Secure genome-wide association analysis using multiparty computation , 2018, Nature Biotechnology.

[39]  Matthew Reimherr,et al.  The function-on-scalar LASSO with applications to longitudinal GWAS , 2016, 1610.07403.

[40]  Xintao Wu,et al.  Infringement of Individual Privacy via Mining Differentially Private GWAS Statistics , 2016, BigCom.

[41]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[42]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[43]  Tariq Ahmad,et al.  Genome-wide association study identifies distinct genetic contributions to prognosis and susceptibility in Crohn's disease , 2017, Nature Genetics.

[44]  Bo Li,et al.  Performing Co-membership Attacks Against Deep Generative Models , 2018, 2019 IEEE International Conference on Data Mining (ICDM).

[45]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[46]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[47]  Leonid Kruglyak,et al.  Genetic interactions contribute less than additive effects to quantitative trait variation in yeast , 2015, Nature Communications.

[48]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[49]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[50]  David T. Jones,et al.  High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features , 2018, Bioinform..

[51]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[52]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[53]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[54]  Dejing Dou,et al.  Differential Privacy Preservation for Deep Auto-Encoders: an Application of Human Behavior Prediction , 2016, AAAI.

[55]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[56]  Reza Shokri,et al.  Machine Learning with Membership Privacy using Adversarial Regularization , 2018, CCS.