GenoGuard: Protecting Genomic Data against Brute-Force Attacks

Secure storage of genomic data is of great and increasing importance. The scientific community's improving ability to interpret individuals' genetic materials and the growing size of genetic database populations have been aggravating the potential consequences of data breaches. The prevalent use of passwords to generate encryption keys thus poses an especially serious problem when applied to genetic data. Weak passwords can jeopardize genetic data in the short term, but given the multi-decade lifespan of genetic data, even the use of strong passwords with conventional encryption can lead to compromise. We present a tool, called Geno Guard, for providing strong protection for genomic data both today and in the long term. Geno Guard incorporates a new theoretical framework for encryption called honey encryption (HE): it can provide information-theoretic confidentiality guarantees for encrypted data. Previously proposed HE schemes, however, can be applied to messages from, unfortunately, a very restricted set of probability distributions. Therefore, Geno Guard addresses the open problem of applying HE techniques to the highly non-uniform probability distributions that characterize sequences of genetic data. In Geno Guard, a potential adversary can attempt exhaustively to guess keys or passwords and decrypt via a brute-force attack. We prove that decryption under any key will yield a plausible genome sequence, and that Geno Guard offers an information-theoretic security guarantee against message-recovery attacks. We also explore attacks that use side information. Finally, we present an efficient and parallelized software implementation of Geno Guard.

[1]  Emiliano De Cristofaro,et al.  Countering GATTACA: efficient and secure testing of fully-sequenced human genomes , 2011, CCS '11.

[2]  Jean-Pierre Hubaux,et al.  Addressing the concerns of the lacks family: quantification of kin genomic privacy , 2013, CCS.

[3]  Angelos D. Keromytis,et al.  Detecting Targeted Attacks Using Shadow Honeypots , 2005, USENIX Security Symposium.

[4]  Noah A Rosenberg,et al.  Mathematical properties of the r2 measure of linkage disequilibrium. , 2008, Theoretical population biology.

[5]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[6]  Yaniv Erlich,et al.  Routes for breaching and protecting genetic privacy , 2013, Nature Reviews Genetics.

[7]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[8]  L. Spitzner,et al.  Honeypots: Tracking Hackers , 2002 .

[9]  Jon Crowcroft,et al.  Honeycomb , 2004, Comput. Commun. Rev..

[10]  Murat Kantarcioglu,et al.  A Cryptographic Approach to Securely Share and Query Genomic Sequences , 2008, IEEE Transactions on Information Technology in Biomedicine.

[11]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[12]  Zhou Li,et al.  Privacy-preserving genomic computation through program specialization , 2009, CCS.

[13]  M. McPeek,et al.  Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. , 1999, American journal of human genetics.

[14]  Stephen E. Fienberg,et al.  Privacy Preserving GWAS Data Sharing , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[15]  J. Doug Tygar,et al.  The battle against phishing: Dynamic Security Skins , 2005, SOUPS '05.

[16]  Carl A. Gunter,et al.  Privacy and Security in the Genomic Era , 2014, CCS.

[17]  Blase Ur,et al.  Measuring password guessability for an entire university , 2013, CCS.

[18]  L R Squire,et al.  On the relationship between recall and recognition memory. , 1992, Journal of experimental psychology. Learning, memory, and cognition.

[19]  Cormac Herley,et al.  A large-scale study of web password habits , 2007, WWW '07.

[20]  Guofei Gu,et al.  HoneyStat: Local Worm Detection Using Honeypots , 2004, RAID.

[21]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[22]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[23]  Manfred Kayser,et al.  The HIrisPlex system for simultaneous prediction of hair and eye colour from DNA. , 2013, Forensic science international. Genetics.

[24]  Ronald L. Rivest,et al.  Honeywords: making password-cracking detectable , 2013, CCS.

[25]  Thomas Ristenpart,et al.  Honey Encryption: Security Beyond the Brute-Force Bound , 2014, IACR Cryptol. ePrint Arch..

[26]  Paul Suetens,et al.  Modeling 3D Facial Shape from DNA , 2014, PLoS genetics.

[27]  Henry L. Owen,et al.  The use of Honeynets to detect exploited systems across large enterprise networks , 2003, IEEE Systems, Man and Cybernetics SocietyInformation Assurance Workshop, 2003..

[28]  Vitaly Shmatikov,et al.  Towards Practical Privacy for Genomic Computation , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[29]  David Malone,et al.  Investigating the distribution of password choices , 2011, WWW.

[30]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[31]  Nikita Borisov,et al.  The Tangled Web of Password Reuse , 2014, NDSS.

[32]  Dan Boneh,et al.  Kamouflage: Loss-Resistant Password Management , 2010, ESORICS.

[33]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[34]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[35]  Vitaly Shmatikov,et al.  Privacy-preserving data exploration in genome-wide association studies , 2013, KDD.

[36]  Sonja Kuhnt,et al.  Goodness‐of‐Fit Tests , 2014 .

[37]  Dario Maio,et al.  Synthetic fingerprint-image generation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[38]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[39]  Stephen E. Fienberg,et al.  Scalable privacy-preserving data sharing methodology for genome-wide association studies , 2014, J. Biomed. Informatics.

[40]  Burton S. Kaliski,et al.  PKCS #5: Password-Based Cryptography Specification Version 2.0 , 2000, RFC.

[41]  Joseph Bonneau,et al.  The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords , 2012, 2012 IEEE Symposium on Security and Privacy.

[42]  Messaoud Benantar,et al.  Access Control Systems: Security, Identity Management and Trust Models , 2005 .

[43]  Jean-Pierre Hubaux,et al.  Protecting and evaluating genomic privacy in medical tests and personalized medicine , 2013, WPES.

[44]  Hongyu Zhao,et al.  Selecting SNPs to Identify Ancestry , 2011, Annals of human genetics.

[45]  Trent Jaeger,et al.  Password Exhaustion: Predicting the End of Password Usefulness , 2006, ICISS.