Efficient differentially private learning improves drug sensitivity prediction

BackgroundUsers of a personalised recommendation system face a dilemma: recommendations can be improved by learning from data, but only if other users are willing to share their private information. Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised. Differential privacy has emerged as a potentially promising solution: privacy is considered sufficient if presence of individual patients cannot be distinguished. However, differentially private learning with current methods does not improve predictions with feasible data sizes and dimensionalities.ResultsWe show that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method. Our method matches the predictive accuracy of the state-of-the-art non-private lasso regression using only 4x more samples under relatively strong differential privacy guarantees. Good performance with limited data is achieved by limiting the sharing of private information by decreasing the dimensionality and by projecting outliers to fit tighter bounds, therefore needing to add less noise for equal privacy.ConclusionsThe proposed differentially private regression method combines theoretical appeal and asymptotic efficiency with good prediction accuracy even with moderate-sized data. As already the simple-to-implement method shows promise on the challenging genomic data, we anticipate rapid progress towards practical applications in many fields.ReviewersThis article was reviewed by Zoltan Gaspari and David Kreil.

[1]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[3]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[4]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  Yang Liu,et al.  Collaborative Security , 2015, ACM Comput. Surv..

[7]  Martin J. Wainwright,et al.  Privacy Aware Learning , 2012, JACM.

[8]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[9]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[10]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[11]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[12]  P. Diaconis,et al.  Conjugate Priors for Exponential Families , 1979 .

[13]  Carl A. Gunter,et al.  Privacy in the Genomic Era , 2014, ACM Comput. Surv..

[14]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[15]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[16]  Seth Pettie,et al.  Linear-Time Approximation for Maximum Weight Matching , 2014, JACM.

[17]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[18]  Stephen E. Fienberg,et al.  Learning with Differential Privacy: Stability, Learnability and the Sufficiency and Necessity of ERM Principle , 2015, J. Mach. Learn. Res..

[19]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[20]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[21]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[22]  James R. Foulds,et al.  On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis , 2016, UAI.

[23]  M. Gerstein,et al.  Quantification of private information leakage from phenotype-genotype data: linking attacks , 2016, Nature Methods.

[24]  Christos Dimitrakakis,et al.  On the Differential Privacy of Bayesian Inference , 2015, AAAI.

[25]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[26]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[27]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[28]  Frank McSherry,et al.  Probabilistic Inference and Differential Privacy , 2010, NIPS.

[29]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[30]  Bonnie Berger,et al.  Realizing privacy preserving genome-wide association studies , 2016, Bioinform..

[31]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[32]  Xiaoqian Jiang,et al.  Differentially private distributed logistic regression using private and public data , 2014, BMC Medical Genomics.

[33]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[34]  Stephen E. Fienberg,et al.  Scalable privacy-preserving data sharing methodology for genome-wide association studies , 2014, J. Biomed. Informatics.

[35]  Zhicong Huang,et al.  Differential Privacy with Bounded Priors: Reconciling Utility and Privacy in Genome-Wide Association Studies , 2015, CCS.

[36]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[37]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[38]  Zhihua Zhang,et al.  Wishart Mechanism for Differentially Private Principal Components Analysis , 2015, AAAI.

[39]  Jeffrey F. Naughton,et al.  Revisiting Differentially Private Regression: Lessons From Learning Theory and their Consequences , 2015, ArXiv.

[40]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[41]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..