Infringement of Individual Privacy via Mining Differentially Private GWAS Statistics

Individual privacy in genomic era is becoming a growing concern as more individuals get their genomes sequenced or genotyped. Infringement of genetic privacy can be conducted even without raw genotypes or sequencing data. Studies have reported that summary statistics from Genome Wide Association Studies (GWAS) can be exploited to threat individual privacy. In this study, we show that even with differentially private GWAS statistics, there is still a risk for leaking individual privacy. Specifically, we constructed a Bayesian network through mining public GWAS statistics, and evaluated two attacks, namely trait inference attack and identity inference attack, for infringement of individual privacy not only for GWAS participants but also regular individuals. We used both simulation and real human genetic data from 1000 Genome Project to evaluate our methods. Our results demonstrated that unexpected privacy breaches could occur and attackers can derive identity information and private information by utilizing these algorithms. Hence, more methodological studies should be invested to understand the infringement and protection of genetic privacy.

[1]  Nuala A Sheehan,et al.  Participant identification in genetic association studies: improved methods and practical implications. , 2011, International journal of epidemiology.

[2]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[3]  Mark Gerstein,et al.  Genomics and Privacy: Implications of the New Reality of Closed Data for the Field , 2011, PLoS Comput. Biol..

[4]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[5]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[6]  Xintao Wu,et al.  Genetic Privacy: Risks, Ethics, and Protection Techniques , 2016 .

[7]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[8]  Mark Gerstein,et al.  Genomic Anonymity: Have We Already Lost It? , 2008, The American journal of bioethics : AJOB.

[9]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013 .

[10]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[11]  Bo Peng,et al.  To Release or Not to Release: Evaluating Information Leaks in Aggregate Human-Genome Data , 2011, ESORICS.

[12]  Xintao Wu,et al.  Using aggregate human genome data for individual identification , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[13]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[14]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[15]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[16]  Stephen E. Fienberg,et al.  Privacy Preserving GWAS Data Sharing , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[17]  Mark Gerstein,et al.  Social Networking and Personal Genomics: Suggestions for Optimizing the Interaction , 2009, The American journal of bioethics : AJOB.

[18]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.