Review Routes for breaching and protecting genetic privacy

We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.

[1]  Dan Bogdanov,et al.  A new way to protect privacy in large-scale genome-wide association studies , 2013, Bioinform..

[2]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[3]  P. A. Kelly,et al.  To share or not to share: A randomized trial of consent for data sharing in genome research , 2011, Genetics in Medicine.

[4]  Eran Halperin,et al.  SNP imputation in association studies , 2009, Nature Biotechnology.

[5]  P. Visscher,et al.  On Jim Watson's APOE status: genetic information is hard to hide , 2009, European Journal of Human Genetics.

[6]  Khaled El Emam,et al.  Heuristics for De-identifying Health Data , 2008, IEEE Secur. Priv..

[7]  Stefan Katzenbeisser,et al.  Privacy-Preserving Matching of DNA Profiles , 2008, IACR Cryptol. ePrint Arch..

[8]  Woodrow Hartzog,et al.  The Case for Online Obscurity , 2012 .

[9]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[10]  N. Hawkins,et al.  Data sharing in genomics — re-shaping scientific practice , 2009, Nature Reviews Genetics.

[11]  J. Kaiser Human genetics. Agency nixes deCODE's new data-mining plan. , 2013, Science.

[12]  Nikita Borisov,et al.  Proceedings of the 2012 ACM workshop on Privacy in the electronic society , 2012, CCS 2012.

[13]  E. Zerhouni,et al.  Protecting Aggregate Genomic Data , 2008, Science.

[14]  Kenneth K. Kidd,et al.  SNPs for a universal individual identification panel , 2010, Human Genetics.

[15]  Daniel J. Solove A Taxonomy of Privacy , 2006 .

[16]  Zhen Lin,et al.  Genomic Research and Human Subject Privacy , 2004, Science.

[17]  Marianne Winslett,et al.  Proceedings of the 5th ACM workshop on Privacy in electronic society , 2006, CCS 2006.

[18]  Huan Wang,et al.  Predicting Human Age with Bloodstains by sjTREC Quantification , 2012, PloS one.

[19]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[20]  Ayellet V. Segrè,et al.  Hundreds of variants clustered in genomic loci and biological pathways affect human height , 2010, Nature.

[21]  Stephen H Friend,et al.  Metcalfe's law and the biology information commons , 2013, Nature Biotechnology.

[22]  N. Cox,et al.  On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. , 2012, American journal of human genetics.

[23]  George Kurtz,et al.  Hacking Exposed: Network Security Secrets & Solutions , 1999 .

[24]  Paul M. Schwartz,et al.  Reconciling Personal Information in the United States and European Union , 2013 .

[25]  M. Jobling,et al.  What's in a name? Y chromosomes, surnames and the genetic genealogy revolution. , 2009, Trends in genetics : TIG.

[26]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[27]  Steven E. Brenner Be prepared for the big genome leak , 2013, Nature.

[28]  H. Brunner Annual Review of Genomics and Human Genetics , 2001, European Journal of Human Genetics.

[29]  Manfred Kayser,et al.  Improving human forensics through advances in genetics, genomics and molecular biology , 2011, Nature Reviews Genetics.

[30]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[31]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[32]  M. Kayser,et al.  Estimating human age from T-cell DNA rearrangements , 2010, Current Biology.

[33]  M. Jobling,et al.  Founders, Drift, and Infidelity: The Relationship between Y Chromosome Diversity and Patrilineal Surnames , 2009, Molecular biology and evolution.

[34]  T. Lumley,et al.  Potential for revealing individual-level information in genome-wide association studies. , 2010, JAMA.

[35]  Itsik Pe'er,et al.  Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples , 2012, PloS one.

[36]  Latanya Sweeney,et al.  Identifying Participants in the Personal Genome Project by Name , 2013, ArXiv.

[37]  H. Greely The uneasy ethical and legal underpinnings of large-scale genomic biobanks. , 2007, Annual review of genomics and human genetics.

[38]  D. Clayton On inferring presence of an individual in a mixture: a Bayesian approach , 2010, Biostatistics.

[39]  P. A. Kelly,et al.  Balancing the Risks and Benefits of Genomic Data Sharing: Genome Research Participants’ Perspectives , 2011, Public Health Genomics.

[40]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[41]  R. Bennett,et al.  Recommendations for standardized human pedigree nomenclature. Pedigree Standardization Task Force of the National Society of Genetic Counselors. , 1995, American journal of human genetics.

[42]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[43]  Andrew D. Johnson,et al.  Temporal Trends in Results Availability from Genome-Wide Association Studies , 2011, PLoS genetics.

[44]  Manfred Kayser,et al.  IrisPlex: a sensitive DNA tool for accurate prediction of blue and brown eye colour in the absence of ancestry information. , 2011, Forensic science international. Genetics.

[45]  Francis S. Collins,et al.  Identifiability in Genomic Research , 2007, Science.

[46]  W. G. Hill,et al.  The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis , 2009, PLoS genetics.

[47]  Jean-Pierre Hubaux,et al.  Privacy-Enhancing Technologies for Medical Tests Using Genomic Data , 2013, NDSS.

[48]  K. Hao,et al.  Bayesian method to predict individual SNP genotypes from gene expression data , 2012, Nature Genetics.

[49]  Jennifer R. Harris,et al.  Heritability of Adult Body Height: A Comparative Study of Twin Cohorts in Eight Countries , 2003, Twin Research.

[50]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[51]  Nassim Nicholas Taleb,et al.  The Black Swan: The Impact of the Highly Improbable , 2007 .

[52]  Khaled El Emam,et al.  Protecting privacy using k-anonymity. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[53]  S. Rosset,et al.  lobSTR: A short tandem repeat profiler for personal genomes , 2012, RECOMB.

[54]  Jinchuan Xing,et al.  Maximum-likelihood estimation of recent shared ancestry (ERSA). , 2011, Genome research.

[55]  S. Sherry,et al.  A mechanism for controlled access to GWAS data: experience of the GAIN Data Access Committee. , 2013, American journal of human genetics.

[56]  Marleen de Bruijne,et al.  A Genome-Wide Association Study Identifies Five Loci Influencing Facial Morphology in Europeans , 2012, PLoS genetics.

[57]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[58]  N. Gilbert Researchers criticize genetic data restrictions , 2008 .

[59]  Bradley Malin,et al.  Technical and Policy Approaches to Balancing Patient Privacy and Data Sharing in Clinical and Translational Research , 2010, Journal of Investigative Medicine.

[60]  Radoje Drmanac,et al.  The Ultimate Genetic Test , 2012, Science.

[61]  Rita Noumeir,et al.  Pseudonymization of Radiology Data for Research Purposes , 2007, Journal of Digital Imaging.

[62]  Alessandro Acquisti,et al.  Predicting Social Security numbers from public data , 2009, Proceedings of the National Academy of Sciences.

[63]  Robert L. Grossman,et al.  Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining , 2013, KDD 2013.

[64]  Rob Stein,et al.  Found on the web, with DNA: a boy's father. , 2005, The Washington post.

[65]  Pierangela Samarati,et al.  Proceedings of the 8th ACM conference on Computer and Communications Security , 1998, CCS 2001.

[66]  Claude Bouchard,et al.  A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance , 2012, Nature Genetics.

[67]  Simon D. Byers Information leakage caused by hidden data in published documents , 2004, IEEE Security & Privacy Magazine.

[68]  Nilanjan Chatterjee,et al.  Estimation of effect size distribution from genome-wide association studies and implications for future discoveries , 2010, Nature Genetics.

[69]  J. Gitschier,et al.  Inferential genotyping of Y chromosomes in Latter-Day Saints founders and comparison to Utah samples in the HapMap project. , 2009, American journal of human genetics.

[70]  Eric D Green,et al.  The Complexities of Genomic Identifiability , 2013, Science.

[71]  Jinghui Zhang,et al.  Needles in the Haystack: Identifying Individuals Present in Pooled Genomic Data , 2009, PLoS genetics.

[72]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[73]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[74]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[75]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[76]  Peter Schaar,et al.  Privacy by Design , 2010 .

[77]  Bradley Malin,et al.  Re-identification of Familial Database Records , 2006, AMIA.

[78]  John Burn,et al.  Should we sequence everyone’s genome? Yes , 2013, BMJ.

[79]  Derek E. Bambauer,et al.  UNIVERSITY of PENNSYLVANIA LAW REVIEW , 2014 .

[80]  G. Church,et al.  Public Access to Genome-Wide Data: Five Views on Balancing Research with Privacy and Protection , 2009, PLoS genetics.

[81]  B A Malin,et al.  Protecting Genomic Sequence Anonymity with Generalization Lattices , 2005, Methods of Information in Medicine.

[82]  G. Church,et al.  From genetic privacy to open consent , 2008, Nature Reviews Genetics.

[83]  Josh P Roberts Million veterans sequenced , 2013, Nature Biotechnology.

[84]  Robert M. Goor,et al.  Assessing and managing risk when sharing aggregate genetic variant data , 2011, Nature Reviews Genetics.

[85]  Jane R. Bambauer Tragedy of the Data Commons , 2011 .

[86]  L. Kohn,et al.  THE ROLE OF GENETICS IN CRANIOFACIAL MORPHOLOGY AND GROWTH , 1991 .

[87]  S. Gabriel,et al.  Analysis of 6,515 exomes reveals a recent origin of most human protein-coding variants , 2012, Nature.

[88]  Nilanjan Chatterjee,et al.  Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies , 2013, Nature Genetics.