The Mastermind Attack on Genomic Data

In this paper, we study the degree to which a genomic string, $Q$,leaks details about itself any time it engages in comparison protocolswith a genomic querier, Bob, even if those protocols arecryptographically guaranteed to produce no additional information otherthan the scores that assess the degree to which $Q$ matches stringsoffered by Bob. We show that such scenarios allow Bob to play variantsof the game of Mastermind with $Q$ so as to learn the complete identityof $Q$. We show that there are a number of efficient implementationsfor Bob to employ in these Mastermind attacks, depending on knowledgehe has about the structure of $Q$, which show how quickly he candetermine $Q$. Indeed, we show that Bob can discover $Q$ using anumber of rounds of test comparisons that is much smaller than thelength of $Q$, under various assumptions regarding the types of scoresthat are returned by the cryptographic protocols and whether he can useknowledge about the distribution that $Q$ comes from, e.g., usingpublic knowledge about the properties of human DNA. We also providethe results of an experimental study we performed on a database ofmitochondrial DNA, showing the vulnerability of existing real-world DNAdata to the Mastermind attack.

[1]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[2]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[3]  Wenliang Du,et al.  Secure and private sequence comparisons , 2003, WPES '03.

[4]  Chris Clifton,et al.  Secure set intersection cardinality with application to association rule mining , 2005, J. Comput. Secur..

[5]  Benny Pinkas,et al.  FairplayMP: a system for secure multi-party computation , 2008, CCS.

[6]  Shamkant B. Navathe,et al.  MITOMAP: a human mitochondrial genome database—2004 update , 2004, Nucleic Acids Res..

[7]  R. J. Mitchell,et al.  The Genographic Project Public Participation Mitochondrial DNA Database , 2007, PLoS Genetics.

[8]  Vasek Chvátal,et al.  Mastermind , 1983, Comb..

[9]  Hong Shen,et al.  Privacy Preserving Set Intersection Protocol Secure against Malicious Behaviors , 2007 .

[10]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[11]  Shamkant B. Navathe,et al.  MITOMAP: a human mitochondrial genome database--1998 update , 1998, Nucleic Acids Res..

[12]  Doug Szajda,et al.  Toward a Practical Data Privacy Scheme for a Distributed Implementation of the Smith-Waterman Genome Sequence Comparison Algorithm , 2006, NDSS.

[13]  Mikhail J. Atallah,et al.  Secure outsourcing of sequence comparisons , 2004, International Journal of Information Security.

[14]  M. Stoneking Mitochondrial DNA and human evolution , 1994, Journal of bioenergetics and biomembranes.

[15]  Michael T. Goodrich,et al.  On the algorithmic complexity of the Mastermind game with black-peg results , 2009, Inf. Process. Lett..

[16]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[17]  Eike Kiltz,et al.  Unconditionally Secure Constant Round Multi-Party Computation for Equality, Comparison, Bits and Exponentiation , 2006, IACR Cryptol. ePrint Arch..

[18]  R. Service The Race for the $1000 Genome , 2006, Science.

[19]  Stefan Katzenbeisser,et al.  Privacy preserving error resilient dna searching through oblivious automata , 2007, CCS '07.

[20]  Guo-Qiang Zhang,et al.  Mastermind is NP-Complete , 2005, ArXiv.

[21]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[22]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[23]  Zhixiang Chen,et al.  Finding a Hidden Code by Asking Questions , 1996, COCOON.

[24]  D. Knuth The Computer as Master Mind , 1977 .

[25]  Hong Shen,et al.  Privacy preserving set intersection based on bilinear groups , 2008, ACSC.

[26]  Chris Clifton,et al.  Similar Document Detection with Limited Information Disclosure , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[27]  I. Damglurd Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation , 2006 .

[28]  G. Church,et al.  The Personal Genome Project , 2005, Molecular systems biology.

[29]  Liu Feng,et al.  Modeling Session Initiation Protocol with Extended Finite State Machines , 2007, Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007).

[30]  Pierre Baldi,et al.  An enhanced MITOMAP with a global mtDNA mutational phylogeny , 2006, Nucleic Acids Res..

[31]  Pierre Baldi,et al.  MITOMASTER: a bioinformatics tool for the analysis of mitochondrial DNA sequences , 2009, Human mutation.

[32]  Wenliang Du,et al.  Secure multi-party computation problems and their applications: a review and open problems , 2001, NSPW '01.

[33]  Artak Amirbekyan,et al.  A New Efficient Privacy-Preserving Scalar Product Protocol , 2007, AusDM.

[34]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[35]  Costas S. Iliopoulos,et al.  Algorithms for computing variants of the longest common subsequence problem , 2008, Theor. Comput. Sci..

[36]  Pierre Baldi,et al.  Lossless Compression of Chemical Fingerprints Using Integer Entropy Codes Improves Storage and Retrieval , 2007, J. Chem. Inf. Model..

[37]  Andrew Odlyzko,et al.  The Rise and Fall of Knapsack Cryptosystems , 1998 .

[38]  Vitaly Shmatikov,et al.  Towards Practical Privacy for Genomic Computation , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[39]  Wenliang Du,et al.  Protocols for Secure Remote Database Access with Approximate Matching , 2001, E-Commerce Security and Privacy.

[40]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[41]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[42]  Alfred V. Aho,et al.  Bounds on the Complexity of the Longest Common Subsequence Problem , 1976, J. ACM.

[43]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[44]  Jocelyn Kaiser,et al.  A Plan to Capture Human Diversity in 1000 Genomes , 2008, Science.