Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing

We initiate the study of privacy in pharmacogenetics, wherein machine learning models are used to guide medical treatments based on a patient's genotype and background. Performing an in-depth case study on privacy in personalized warfarin dosing, we show that suggested models carry privacy risks, in particular because attackers can perform what we call model inversion: an attacker, given the model and some demographic information about a patient, can predict the patient's genetic markers. As differential privacy (DP) is an oft-proposed solution for medical settings such as this, we evaluate its effectiveness for building private versions of pharmacogenetic models. We show that DP mechanisms prevent our model inversion attacks when the privacy budget is carefully selected. We go on to analyze the impact on utility by performing simulated clinical trials with DP dosing models. We find that for privacy budgets effective at preventing attacks, patients would be exposed to increased risk of stroke, bleeding events, and mortality. We conclude that current DP mechanisms do not simultaneously improve genomic privacy while retaining desirable clinical efficacy, highlighting the need for new mechanisms that should be evaluated in situ using the general methodology introduced by our work.

[1]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[2]  Andreas Haeberlen,et al.  Differential privacy for collaborative security , 2010, EUROSEC '10.

[3]  F. Kamali,et al.  Pharmacogenetics of warfarin. , 2010, Annual review of medicine.

[4]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.

[5]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[6]  Denis Nekipelov,et al.  Estimation of Treatment Effects from Combined Data: Identification versus Data Security , 2015 .

[7]  D. Singer,et al.  Cost-effectiveness of warfarin: trial versus "real-world" stroke prevention in atrial fibrillation. , 2009, American heart journal.

[8]  R. Califf,et al.  A pharmacogenetic versus a clinical algorithm for warfarin dosing. , 2013, The New England journal of medicine.

[9]  Peter L. Bonate,et al.  Clinical Trial Simulation in Drug Development , 2000, Pharmaceutical Research.

[10]  Khaled El Emam,et al.  The application of differential privacy to health data , 2012, EDBT-ICDT '12.

[11]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[12]  T. Schacker,et al.  Clinical and Epidemiologic Features of Primary HIV Infection , 1996, Annals of Internal Medicine.

[13]  Peter Wood,et al.  The impact of CYP2C9 and VKORC1 genetic polymorphism and patient characteristics upon warfarin dose requirements: proposal for a new dosing regimen. , 2005, Blood.

[14]  Staal A. Vinterbo,et al.  Differentially Private Projected Histograms: Construction and Use for Prediction , 2012, ECML/PKDD.

[15]  Nicole Soranzo,et al.  A Genome-Wide Association Study Confirms VKORC1, CYP2C9, and CYP4F2 as Principal Genetic Determinants of Warfarin Dose , 2009, PLoS genetics.

[16]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[17]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[18]  E N Jonsson,et al.  A PK–PD Model for Predicting the Impact of Age, CYP2C9, and VKORC1 Genotype on Individualization of Warfarin Therapy , 2007, Clinical pharmacology and therapeutics.

[19]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[20]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[21]  R. Altman,et al.  Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.

[22]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[23]  Jing Lei,et al.  Differentially Private M-Estimators , 2011, NIPS.

[24]  N H Holford,et al.  Simulation of clinical trials. , 2000, Annual review of pharmacology and toxicology.

[25]  Graham Cormode,et al.  Personal privacy vs population privacy: learning to attack anonymization , 2011, KDD.

[26]  Adam D. Smith,et al.  The Power of Linear Reconstruction Attacks , 2012, SODA.

[27]  B. Horne,et al.  Genotypes of the cytochrome p450 isoform, CYP2C9, and the vitamin K epoxide reductase complex subunit 1 conjointly determine stable warfarin dose: a prospective study , 2006, Journal of Thrombosis and Thrombolysis.

[28]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[29]  Eran Omri,et al.  A Practical Application of Differential Privacy to Personalized Online Advertising , 2011, IACR Cryptol. ePrint Arch..

[30]  D. Anderson,et al.  Comparison of 10-mg and 5-mg Warfarin Initiation Nomograms Together with Low-Molecular-Weight Heparin for Outpatient Treatment of Acute Venous Thromboembolism , 2003, Annals of Internal Medicine.

[31]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[32]  N. Holford,et al.  Clinical Trial Simulation: A Review , 2010, Clinical pharmacology and therapeutics.

[33]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[34]  Cynthia Dwork,et al.  The Promise of Differential Privacy: A Tutorial on Algorithmic Techniques , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[35]  Shiew-Mei Huang,et al.  A Regulatory Science Perspective on Warfarin Therapy: A Pharmacogenetic Opportunity , 2009, Journal of clinical pharmacology.

[36]  Larry D. Brace,et al.  Current Status of the International Normalized Ratio , 2001 .

[37]  B. Horne,et al.  Randomized Trial of Genotype-Guided Versus Standard Warfarin Dosing in Patients Initiating Oral Anticoagulation , 2007, Circulation.

[38]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[39]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[40]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[41]  C. Dwork,et al.  Differential Privacy – A Primer for the Perplexed , 2011 .

[42]  Chih-Lin Chi,et al.  A Systems Approach to Designing Effective Clinical Trials Using Simulations , 2013, Circulation.

[43]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[44]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[45]  Aleksandra B. Slavkovic,et al.  Differential Privacy for Clinical Trial Data: Preliminary Evaluations , 2009, 2009 IEEE International Conference on Data Mining Workshops.