Efficient Genome-Wide, Privacy-Preserving Similar Patient Query based on Private Edit Distance

Edit distance has been proven to be an important and frequently-used metric in many human genomic research, with Similar Patient Query (SPQ) being a particularly promising and attractive example. However, due to the widespread privacy concerns on revealing personal genomic data, the scope and scale of many novel use of genome edit distance are substantially limited. While the problem of private genomic edit distance has been studied by the research community for over a decade [6], the state-of-the-art solution [31] is far from even close to be applicable to real genome sequences. In this paper, we propose several private edit distance protocols that feature unprecedentedly high efficiency and precision. Our construction is a combination of a novel genomic edit distance ap- proximation algorithm and new construction of private set difference size protocols. With the private edit distance based secure SPQ primitive, we propose GENSETS, a genome-wide, privacy- preserving similar patient query system. It is able to support search- ing large-scale, distributed genome databases across the nation. We have implemented a prototype of GENSETS. The experimental results show that, with 100 Mbps network connection, it would take GENSETS less than 200 minutes to search through 1 million breast cancer patients (distributed nation-wide in 250 hospitals, each having 4000 patients), based on edit distances between their genomes of lengths about 75 million nucleotides each.

[1]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[2]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[3]  Andrew Chi-Chih Yao,et al.  Protocols for Secure Computations (Extended Abstract) , 1982, FOCS.

[4]  Andrew Chi-Chih Yao,et al.  How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[5]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[6]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  D. Altman Construction of age-related reference centiles using absolute residuals. , 1993, Statistics in medicine.

[9]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[10]  Donald Beaver,et al.  Correlated pseudorandomness and the complexity of private computations , 1996, STOC '96.

[11]  Donald Beaver,et al.  Commodity-based cryptography (extended abstract) , 1997, STOC '97.

[12]  Moni Naor,et al.  Distributed Oblivious Transfer , 2000, ASIACRYPT.

[13]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[14]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[15]  S. Chanock,et al.  Using genetic variation to study human disease. , 2001, Trends in molecular medicine.

[16]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[17]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[18]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[19]  Wenliang Du,et al.  Secure and private sequence comparisons , 2003, WPES '03.

[20]  Yuval Ishai,et al.  Extending Oblivious Transfers Efficiently , 2003, CRYPTO.

[21]  M. Relling,et al.  Moving towards individualized medicine with pharmacogenomics , 2004, Nature.

[22]  Mikkel Thorup,et al.  Tabulation based 4-universal hashing with applications to second moment estimation , 2004, SODA '04.

[23]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[24]  Donald Beaver,et al.  Secure multiparty protocols and zero-knowledge proof systems tolerating a faulty minority , 2004, Journal of Cryptology.

[25]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[26]  Robert Krauthgamer,et al.  Approximating edit distance efficiently , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[27]  G. Church,et al.  The Personal Genome Project , 2005, Molecular systems biology.

[28]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[29]  Joan Feigenbaum,et al.  Secure multiparty computation of approximations , 2001, TALG.

[30]  Graham Cormode,et al.  On Estimating Frequency Moments of Data Streams , 2007, APPROX-RANDOM.

[31]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[32]  Vitaly Shmatikov,et al.  Efficient Two-Party Secure Computation on Committed Inputs , 2007, EUROCRYPT.

[33]  Pierre Baldi,et al.  Mathematical Correction for Fingerprint Similarity Measures to Improve Chemical Retrieval , 2007, J. Chem. Inf. Model..

[34]  Yuval Ishai,et al.  Founding Cryptography on Oblivious Transfer - Efficiently , 2008, CRYPTO.

[35]  Vladimir Kolesnikov,et al.  Improved Garbled Circuit: Free XOR Gates and Applications , 2008, ICALP.

[36]  Yehuda Lindell,et al.  A Proof of Security of Yao’s Protocol for Two-Party Computation , 2009, Journal of Cryptology.

[37]  Alexandr Andoni,et al.  Approximating edit distance in near-linear time , 2009, STOC '09.

[38]  Benny Pinkas,et al.  Secure Two-Party Computation is Practical , 2009, IACR Cryptol. ePrint Arch..

[39]  N. Hawkins,et al.  Assessing the Privacy Risks of Data Sharing in Genomics , 2010, Public Health Genomics.

[40]  David P. Woodruff,et al.  Fast Manhattan sketches in data streams , 2010, PODS '10.

[41]  Jonathan Katz,et al.  Faster Secure Two-Party Computation Using Garbled Circuits , 2011, USENIX Security Symposium.

[42]  Abhi Shelat,et al.  Efficient Secure Computation with Garbled Circuits , 2011, ICISS.

[43]  George Varghese,et al.  What's the difference?: efficient set reconciliation without prior context , 2011, SIGCOMM.

[44]  Claudio Orlandi,et al.  A New Approach to Practical Active-Secure Two-Party Computation , 2012, IACR Cryptol. ePrint Arch..

[45]  Emiliano De Cristofaro,et al.  Countering GATTACA: efficient and secure testing of fully-sequenced human genomes , 2011, CCS '11.

[46]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[47]  Jonathan Katz,et al.  Quid-Pro-Quo-tocols: Strengthening Semi-honest Protocols with Dual Execution , 2012, 2012 IEEE Symposium on Security and Privacy.

[48]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[49]  Yehuda Lindell,et al.  More efficient oblivious transfer and extensions for faster secure computation , 2013, CCS.

[50]  Claudio Orlandi,et al.  MiniLEGO: Efficient Secure Two-Party Computation from General Assumptions , 2013, EUROCRYPT.

[51]  Mihir Bellare,et al.  Efficient Garbling from a Fixed-Key Blockcipher , 2013, 2013 IEEE Symposium on Security and Privacy.

[52]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[53]  Changyu Dong,et al.  When private set intersection meets big data: an efficient and scalable protocol , 2013, CCS.

[54]  Emiliano De Cristofaro,et al.  EsPRESSO: Efficient privacy-preserving evaluation of sample set similarity , 2014, J. Comput. Secur..

[55]  M. Watson,et al.  Illuminating the future of DNA sequencing , 2014, Genome Biology.

[56]  Elaine Shi,et al.  Automating Efficient RAM-Model Secure Computation , 2014, 2014 IEEE Symposium on Security and Privacy.

[57]  Yehuda Lindell,et al.  An Efficient Protocol for Secure Two-Party Computation in the Presence of Malicious Adversaries , 2007, Journal of Cryptology.

[58]  David Evans,et al.  Two Halves Make a Whole - Reducing Data Transfer in Garbled Circuits Using Half Gates , 2015, EUROCRYPT.

[59]  J. Kench,et al.  Whole genomes redefine the mutational landscape of pancreatic cancer , 2015, Nature.