A novel, privacy-preserving cryptographic approach for sharing sequencing data

OBJECTIVE DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords. MATERIALS AND METHODS This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual's genetic sequence. This scheme requires access to a subset of an individual's genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch. RESULTS We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space. DISCUSSION Access to a set of an individual's genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture. CONCLUSIONS It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual's genetic sequence.

[1]  G A Colditz,et al.  Weight, weight gain, activity, and major illnesses: the Nurses' Health Study. , 1997, International journal of sports medicine.

[2]  David Lazer,et al.  Finding Criminals Through DNA of Their Relatives , 2006, Science.

[3]  F. Cappuccio,et al.  Application of Framingham risk estimates to ethnic minorities in United Kingdom and implications for primary prevention of heart disease in general practice: cross sectional population based study , 2002, BMJ : British Medical Journal.

[4]  Melissa L. Finucane,et al.  Impact of direct-to-consumer advertising for hereditary breast cancer testing on genetic services at a managed care organization: A naturally-occurring experiment , 2005, Genetics in Medicine.

[5]  E. Clayton,et al.  Identifiability in biobanks: models, measures, and mitigation strategies , 2011, Human Genetics.

[6]  Linda Fleisher,et al.  Facilitating Informed Decision Making about Breast Cancer Risk and Genetic Counseling Among Women Calling the NCI's Cancer Information Service , 2005, Journal of health communication.

[7]  David Lazer,et al.  Human genetics. Finding criminals through DNA of their relatives. , 2006, Science.

[8]  Murat Kantarcioglu,et al.  A Cryptographic Approach to Securely Share and Query Genomic Sequences , 2008, IEEE Transactions on Information Technology in Biomedicine.

[9]  Venkatesan Guruswami,et al.  Unbalanced expanders and randomness extractors from Parvaresh--Vardy codes , 2009, JACM.

[10]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[11]  Sukhamrit Kaur,et al.  Genomics With Cloud Computing , 2015 .

[12]  Lawrence O Gostin,et al.  Health information privacy. , 1995, Cornell law review.

[13]  Lidewij Henneman,et al.  Public Experiences, Knowledge and Expectations about Medical Genetics and the Use of Genetic Information , 2004, Public Health Genomics.

[14]  P Ducimetière,et al.  Are the Framingham and PROCAM coronary heart disease risk functions applicable to different European populations? The PRIME Study. , 2003, European heart journal.

[15]  Russ B Altman,et al.  Genetics. Genomic research and human subject privacy. , 2004, Science.

[16]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[17]  Hugo Krawczyk,et al.  HMAC-based Extract-and-Expand Key Derivation Function (HKDF) , 2010, RFC.

[18]  Christopher A Cassa,et al.  My sister's keeper?: genomic research and the identifiability of siblings , 2008, BMC Medical Genomics.

[19]  F. Collins,et al.  Keeping pace with the times--the Genetic Information Nondiscrimination Act of 2008. , 2008, The New England journal of medicine.

[20]  Enkatesan G Uruswami Unbalanced expanders and randomness extractors from Parvaresh-Vardy codes , 2008 .

[21]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[22]  Kenneth D. Mandl,et al.  Reestablishing the Researcher-Patient Compact , 2007, Science.

[23]  Isaac S Kohane,et al.  Medicine. Reestablishing the researcher-patient compact. , 2007, Science.

[24]  M. Marazita,et al.  Genome-wide Association Studies , 2012, Journal of dental research.

[25]  Nathan Blow,et al.  Biobanking: freezer burn , 2009, Nature Methods.

[26]  David Lazer,et al.  Guilt by association: should the law be able to use one person's DNA to carry out surveillance on their family? Not without a public debate. , 2004, New scientist.

[27]  Bradley Malin,et al.  Inferring Genotype from Clinical Phenotype through a Knowledge Based Algorithm , 2001, Pacific Symposium on Biocomputing.

[28]  Katsushi Tokunaga,et al.  Evaluating the performance of Affymetrix SNP Array 6.0 platform with 400 Japanese individuals , 2008, BMC Genomics.

[29]  Michael I. Jordan,et al.  Genomic privacy and limits of individual detection in a pool , 2009, Nature Genetics.

[30]  N. Morton,et al.  Into the post-HapMap era. , 2008, Advances in genetics.

[31]  Constance Holden,et al.  Long-Awaited Genetic Nondiscrimination Bill Headed for Easy Passage , 2007, Science.

[32]  H. Boezen,et al.  Genome-wide association studies: what do they teach us about asthma and chronic obstructive pulmonary disease? , 2009, Proceedings of the American Thoracic Society.

[33]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[34]  Ben Adida,et al.  GenePING: secure, scalable management of personal genomic data , 2006, BMC Genomics.

[35]  Louette R. Johnson Lutjens Research , 2006 .

[36]  Zhen Lin,et al.  Genomic Research and Human Subject Privacy , 2004, Science.

[37]  Suzanne Laurion,et al.  Assessing controversial direct-to-consumer advertising for hereditary breast cancer testing: reactions from women and their physicians in a managed care organization. , 2005, The American journal of managed care.

[38]  Amy L McGuire 1000 Genomes on the Road to Personalized Medicine. , 2008, Personalized medicine.

[39]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[40]  K. Lunetta,et al.  Genome-wide association with select biomarker traits in the Framingham Heart Study , 2007, BMC Medical Genetics.