Countering GATTACA: efficient and secure testing of fully-sequenced human genomes

Recent advances in DNA sequencing technologies have put ubiquitous availability of fully sequenced human genomes within reach. It is no longer hard to imagine the day when everyone will have the means to obtain and store one's own DNA sequence. Widespread and affordable availability of fully sequenced genomes immediately opens up important opportunities in a number of health-related fields. In particular, common genomic applications and tests performed in vitro today will soon be conducted computationally, using digitized genomes. New applications will be developed as genome-enabled medicine becomes increasingly preventive and personalized. However, this progress also prompts significant privacy challenges associated with potential loss, theft, or misuse of genomic data. In this paper, we begin to address genomic privacy by focusing on three important applications: Paternity Tests, Personalized Medicine, and Genetic Compatibility Tests. After carefully analyzing these applications and their privacy requirements, we propose a set of efficient techniques based on private set operations. This allows us to implement in in silico some operations that are currently performed via in vitro methods, in a secure fashion. Experimental results demonstrate that proposed techniques are both feasible and practical today.

[1]  P. Sham,et al.  Replication study of SNP associations for colorectal cancer in Hong Kong Chinese , 2010, British Journal of Cancer.

[2]  Richard A. Gibbs,et al.  Currents in Contemporary Ethics: Meeting the Growing Demands of Genetic Research , 2006, Journal of Law, Medicine & Ethics.

[3]  H. Willard,et al.  Genomic and personalized medicine: foundations and applications. , 2009, Translational research : the journal of laboratory and clinical medicine.

[4]  Carmela Troncoso,et al.  Efficient Negative Databases from Cryptographic Hash Functions , 2007, ISC.

[5]  Yehuda Lindell,et al.  Introduction to Modern Cryptography , 2004 .

[6]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[7]  Ching-Hon Pui,et al.  Molecular Diagnosis of Thiopurine S-Methyltransferase Deficiency: Genetic Basis for Azathioprine and Mercaptopurine Intolerance , 1997, Annals of Internal Medicine.

[8]  Stefan Katzenbeisser,et al.  Privacy-Preserving Matching of DNA Profiles , 2008, IACR Cryptol. ePrint Arch..

[9]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[10]  Pierre Baldi,et al.  Data structures and compression algorithms for genomic sequence data , 2009, Bioinform..

[11]  Meredith Wadman Genetics bill cruises through Senate , 2008, Nature.

[12]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[13]  Rainer Breitling,et al.  What is Systems Biology? , 2010, Front. Physiology.

[14]  Edda Klipp,et al.  Systems Biology , 1994 .

[15]  Ricki Lewis,et al.  Human Genetics: Concepts and Applications , 1997 .

[16]  N. Dracopoli,et al.  Current protocols in human genetics , 1994 .

[17]  Jocelyn Kaiser,et al.  A Plan to Capture Human Diversity in 1000 Genomes , 2008, Science.

[18]  M. A. Hoffman,et al.  The genome-enabled electronic medical record , 2007, J. Biomed. Informatics.

[19]  Yehuda Lindell,et al.  Efficient Protocols for Set Intersection and Pattern Matching with Security Against Malicious and Covert Adversaries , 2008, Journal of Cryptology.

[20]  Emiliano De Cristofaro,et al.  Fast and Private Computation of Set Intersection Cardinality , 2011, IACR Cryptol. ePrint Arch..

[21]  J. Stephenson 1000 Genomes Project , 2008 .

[22]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[23]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24]  Carmit Hazay,et al.  Computationally Secure Pattern Matching in the Presence of Malicious Adversaries , 2010, Journal of Cryptology.

[25]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[26]  Taher El Gamal A public key cryptosystem and a signature scheme based on discrete logarithms , 1984, IEEE Trans. Inf. Theory.

[27]  J. Stockman,et al.  Risk of Pancreatic Cancer in Families With Lynch Syndrome , 2011 .

[28]  Chris Clifton,et al.  Secure set intersection cardinality with application to association rule mining , 2005, J. Comput. Secur..

[29]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[30]  A. Tanigami,et al.  Isolation and mapping of 62 new RFLP markers on human chromosome 11. , 1991, American journal of human genetics.

[31]  Leroy Hood,et al.  Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. , 2004, Journal of proteome research.

[32]  Alison Abbott,et al.  Special section on human genetics: With your genes? Take one of these, three times a day , 2003, Nature.

[33]  Francesca Forzano,et al.  The molecular mechanism underlying Roberts syndrome involves loss of ESCO2 acetyltransferase activity. , 2008, Human molecular genetics.

[34]  Marina Blanton,et al.  Secure Outsourcing of DNA Searching via Finite Automata , 2010, DBSec.

[35]  Emiliano De Cristofaro,et al.  Practical Private Set Intersection Protocols with Linear Complexity , 2010, Financial Cryptography.

[36]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[37]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[38]  F. Collins,et al.  Implications of the Human Genome Project for medical science. , 2001, JAMA.

[39]  J. Baselga,et al.  The role of hormonal therapy in the management of hormonal-receptor-positive breast cancer with co-expression of HER2 , 2008, Nature Clinical Practice Oncology.

[40]  Stefan Katzenbeisser,et al.  Privacy preserving error resilient dna searching through oblivious automata , 2007, CCS '07.

[41]  Niels Morling,et al.  Performance of the SNPforID 52 SNP-plex assay in paternity testing. , 2008, Forensic science international. Genetics.

[42]  Oded Goldreich,et al.  On the Security of Modular Exponentiation with Application to the Construction of Pseudorandom Generators , 2003, Journal of Cryptology.

[43]  Jan Camenisch,et al.  Private Intersection of Certified Sets , 2009, Financial Cryptography.

[44]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[45]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[46]  Mary L. Durham How Research Will Adapt to HIPAA: A View from Within the Healthcare Delivery System , 2002, American Journal of Law & Medicine.

[47]  Richard Walton,et al.  Combined DNA Index System (CODIS) , 2006 .

[48]  JenniferKulynych,et al.  The New HIPAA (Health Insurance Portability and Accountability Act of 1996) Medical Privacy Rule , 2003 .

[49]  Alun Anderson,et al.  DNA fingerprinting on trial , 1989, Nature.

[50]  David Korn,et al.  The new HIPAA (Health Insurance Portability and Accountability Act of 1996) Medical Privacy Rule: help or hindrance for clinical research? , 2003, Circulation.

[51]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[52]  Ivan Damgård,et al.  Homomorphic encryption and secure comparison , 2008, Int. J. Appl. Cryptogr..

[53]  Benny Pinkas,et al.  Keyword Search and Oblivious Pseudorandom Functions , 2005, TCC.

[54]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[55]  Jonathan Katz,et al.  Secure text processing with applications to private DNA matching , 2010, CCS '10.

[56]  R. Service The Race for the $1000 Genome , 2006, Science.

[57]  James H Fowler,et al.  Correlated genotypes in friendship networks , 2011, Proceedings of the National Academy of Sciences.

[58]  J. Beckmann,et al.  Restriction fragment length polymorphisms and genetic improvement of agricultural species , 1986, Euphytica.

[59]  Vitaly Shmatikov,et al.  Towards Practical Privacy for Genomic Computation , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[60]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[61]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[62]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[63]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[64]  Markus Hsi-Yang Fritz,et al.  Efficient storage of high throughput DNA sequencing data using reference-based compression. , 2011, Genome research.

[65]  Xiaohui Xie,et al.  Data structures and compression algorithms for high-throughput sequencing technologies , 2010, BMC Bioinformatics.

[66]  Murat Kantarcioglu,et al.  A Cryptographic Approach to Securely Share and Query Genomic Sequences , 2008, IEEE Transactions on Information Technology in Biomedicine.

[67]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[68]  Bo Peng,et al.  To Release or Not to Release: Evaluating Information Leaks in Aggregate Human-Genome Data , 2011, ESORICS.

[69]  A. Young,et al.  A polymorphic DNA marker genetically linked to Huntington's disease , 1983, Nature.

[70]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[71]  Bradley Malin,et al.  Technical Evaluation: An Evaluation of the Current State of Genomic Data Privacy Protection Technology and a Roadmap for the Future , 2004, J. Am. Medical Informatics Assoc..

[72]  Carmit Hazay,et al.  Text Search Protocols with Simulation Based Security , 2010, Public Key Cryptography.

[73]  Emiliano De Cristofaro,et al.  EsPRESSo: Efficient Privacy-Preserving Evaluation of Sample Set Similarity , 2012, DPM/SETOP.

[74]  Emiliano De Cristofaro,et al.  Linear-Complexity Private Set Intersection Protocols Secure in Malicious Model , 2010, ASIACRYPT.

[75]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[76]  J. Gibbs,et al.  Application of Genome-Wide Single Nucleotide Polymorphism Typing: Simple Association and Beyond , 2006, PLoS genetics.

[77]  Yehuda Lindell,et al.  Introduction to Modern Cryptography (Chapman & Hall/Crc Cryptography and Network Security Series) , 2007 .

[78]  Michael Krawczak,et al.  The human gene mutation database , 1998, Nucleic Acids Res..

[79]  Zhou Li,et al.  Privacy-preserving genomic computation through program specialization , 2009, CCS.

[80]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[81]  Max Lowenthal,et al.  The Federal Bureau of Investigation , 1951 .

[82]  Xiaomin Liu,et al.  Fast Secure Computation of Set Intersection , 2010, SCN.

[83]  Life Technologies,et al.  A map of human genome variation from population-scale sequencing , 2011 .

[84]  F. Marincola,et al.  HLA B*5701 is highly associated with restriction of virus replication in a subgroup of HIV-infected long term nonprogressors. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[85]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .