An overview of human genetic privacy

The study of human genomics is becoming a Big Data science, owing to recent biotechnological advances leading to availability of millions of personal genome sequences, which can be combined with biometric measurements from mobile apps and fitness trackers, and of human behavior data monitored from mobile devices and social media. With increasing research opportunities for integrative genomic studies through data sharing, genetic privacy emerges as a legitimate yet challenging concern that needs to be carefully addressed, not only for individuals but also for their families. In this paper, we present potential genetic privacy risks and relevant ethics and regulations for sharing and protecting human genomics data. We also describe the techniques for protecting human genetic privacy from three broad perspectives: controlled access, differential privacy, and cryptographic solutions.

[1]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[3]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[4]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[5]  M. Gerstein,et al.  Quantification of private information leakage from phenotype-genotype data: linking attacks , 2016, Nature Methods.

[6]  P. Visscher,et al.  On Jim Watson's APOE status: genetic information is hard to hide , 2009, European Journal of Human Genetics.

[7]  Stephen E. Fienberg,et al.  Privacy Preserving GWAS Data Sharing , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[8]  Emiliano De Cristofaro,et al.  Secure genomic testing with size- and position-hiding private substring matching , 2013, WPES.

[9]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[10]  Philip S. Yu,et al.  A General Survey of Privacy-Preserving Data Mining Models and Algorithms , 2008, Privacy-Preserving Data Mining.

[11]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[12]  Emiliano De Cristofaro,et al.  Genodroid: are privacy-preserving genomic tests ready for prime time? , 2012, WPES '12.

[13]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[14]  C. Bustamante,et al.  Privacy Risks from Genomic Data-Sharing Beacons , 2015, American journal of human genetics.

[15]  Johannes Gehrke,et al.  Differential privacy via wavelet transforms , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  Leting Wu,et al.  Differential Privacy Preserving Spectral Graph Analysis , 2013, PAKDD.

[17]  M. W. Foster,et al.  Integrating ethics and science in the International HapMap Project , 2004, Nature Reviews Genetics.

[18]  Nita A. Farahany,et al.  Redefining Genomic Privacy: Trust and Empowerment , 2014, bioRxiv.

[19]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[20]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[21]  Xintao Wu,et al.  Infringement of Individual Privacy via Mining Differentially Private GWAS Statistics , 2016, BigCom.

[22]  Hyeon-Eui Kim,et al.  Feasibility of Using Clinical Element Models (CEM) to Standardize Phenotype Variables in the Database of Genotypes and Phenotypes (dbGaP) , 2012, HISB.

[23]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[24]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[25]  Noam Shomron,et al.  Privacy, anonymity and subjectivity in genomic research. , 2016, Genetics research.

[26]  Dan Suciu,et al.  Relationship privacy: output perturbation for queries with joins , 2009, PODS.

[27]  Mark Gerstein,et al.  Genomics and Privacy: Implications of the New Reality of Closed Data for the Field , 2011, PLoS Comput. Biol..

[28]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[29]  Latanya Sweeney,et al.  Identifying Participants in the Personal Genome Project by Name , 2013, ArXiv.

[30]  Xintao Wu,et al.  Regression Model Fitting under Differential Privacy and Model Inversion Attack , 2015, IJCAI.

[31]  K. Hao,et al.  Bayesian method to predict individual SNP genotypes from gene expression data , 2012, Nature Genetics.

[32]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[33]  Emiliano De Cristofaro,et al.  Countering GATTACA: efficient and secure testing of fully-sequenced human genomes , 2011, CCS '11.

[34]  R. Ostrovsky,et al.  Identifying genetic relatives without compromising privacy , 2014, Genome research.

[35]  Erika Check Hayden,et al.  Informed consent: A broken contract , 2012, Nature.

[36]  Mete Akgün,et al.  Privacy preserving processing of genomic data: A survey , 2015, J. Biomed. Informatics.

[37]  David D. Jensen,et al.  Accurate Estimation of the Degree Distribution of Private Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[38]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[39]  George Packer The Broken Contract , 2011 .

[40]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[41]  Xiaowei Ying,et al.  On Linear Refinement of Differential Privacy-Preserving Query Answering , 2013, PAKDD.

[42]  Yin Yang,et al.  Functional Mechanism: Regression Analysis under Differential Privacy , 2012, Proc. VLDB Endow..

[43]  Mark Gerstein,et al.  Genomic Anonymity: Have We Already Lost It? , 2008, The American journal of bioethics : AJOB.

[44]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[45]  Eric D Green,et al.  The Complexities of Genomic Identifiability , 2013, Science.

[46]  Mark Gerstein,et al.  Social Networking and Personal Genomics: Suggestions for Optimizing the Interaction , 2009, The American journal of bioethics : AJOB.

[47]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[48]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[49]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[50]  Roksana Boreli,et al.  Secure Evaluation Protocol for Personalized Medicine , 2014, WPES.

[51]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[52]  Dejing Dou,et al.  Differential Privacy Preservation for Deep Auto-Encoders: an Application of Human Behavior Prediction , 2016, AAAI.

[53]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[54]  加藤和人 Global Alliance for Genomics and Health(GA4GH):ゲノムデータと臨床データの責任ある共有に向けた新しいイニシアティブ , 2015 .

[55]  Xintao Wu,et al.  Using aggregate human genome data for individual identification , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[56]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[57]  Urs Gasser,et al.  Between Openness and Privacy in Genomics , 2016, PLoS medicine.

[58]  N. Cox,et al.  On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. , 2012, American journal of human genetics.

[59]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[60]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[61]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[62]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..