Privacy-Preserving Search of Similar Patients in Genomic Data

Abstract The growing availability of genomic data holds great promise for advancing medicine and research, but unlocking its full potential requires adequate methods for protecting the privacy of individuals whose genome data we use. One example of this tension is running Similar Patient Query on remote genomic data: In this setting a doctor that holds the genome of his/her patient may try to find other individuals with “close” genomic data, and use the data of these individuals to help diagnose and find effective treatment for that patient’s conditions. This is clearly a desirable mode of operation. However, the privacy exposure implications are considerable, and so we would like to carry out the above “closeness” computation in a privacy preserving manner. In this work we put forward a new approach for highly efficient secure computation for computing an approximation of the Similar Patient Query problem. We present contributions on two fronts. First, an approximation method that is designed with the goal of achieving efficient private computation. Second, further optimizations of the two-party protocol. Our tests indicate that the approximation method works well, it returns the exact closest records in 98% of the queries and very good approximation otherwise. As for speed, our protocol implementation takes just a few seconds to run on databases with thousands of records, each of length thousands of alleles, and it scales almost linearly with both the database size and the length of the sequences in it. As an example, in the datasets of the recent iDASH competition, after a one-time preprocessing of around 12 seconds, it takes around a second to find the nearest five records to a query, in a size-500 dataset of length- 3500 sequences. This is 2-3 orders of magnitude faster than using state-of-the-art secure protocols with existing edit distance algorithms.

[1]  Emiliano De Cristofaro,et al.  Countering GATTACA: efficient and secure testing of fully-sequenced human genomes , 2011, CCS '11.

[2]  Silvio Micali,et al.  How to play any mental game, or a completeness theorem for protocols with honest majority , 2019, Providing Sound Foundations for Cryptography.

[3]  Yan Huang,et al.  Efficient Genome-Wide, Privacy-Preserving Similar Patient Query based on Private Edit Distance , 2015, CCS.

[4]  Jonathan Katz,et al.  Faster Secure Two-Party Computation Using Garbled Circuits , 2011, USENIX Security Symposium.

[5]  Vladimir Kolesnikov,et al.  Improved Garbled Circuit: Free XOR Gates and Applications , 2008, ICALP.

[6]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[7]  Piotr Indyk,et al.  Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false) , 2014, STOC.

[8]  Jure Leskovec,et al.  Mining of Massive Datasets, 2nd Ed , 2014 .

[9]  Yehuda Lindell,et al.  A Proof of Security of Yao’s Protocol for Two-Party Computation , 2009, Journal of Cryptology.

[10]  Andrew Chi-Chih Yao,et al.  How to Generate and Exchange Secrets (Extended Abstract) , 1986, FOCS.

[11]  Alexandr Andoni,et al.  Approximating edit distance in near-linear time , 2009, STOC '09.

[12]  David Evans,et al.  Two Halves Make a Whole - Reducing Data Transfer in Garbled Circuits Using Half Gates , 2015, EUROCRYPT.

[13]  Dima Alhadidi,et al.  Secure approximation of edit distance on genomic data , 2017, BMC Medical Genomics.

[14]  Yehuda Lindell,et al.  More efficient oblivious transfer and extensions for faster secure computation , 2013, CCS.

[15]  Din J. Wasem Mining of Massive Datasets , 2014 .

[16]  Marcel Keller,et al.  Actively Secure OT Extension with Optimal Overhead , 2015, CRYPTO.

[17]  Yan Huang,et al.  Efficient Privacy-Preserving Edit Distance and Beyond , 2017, IACR Cryptol. ePrint Arch..

[18]  Yan Huang,et al.  Efficient Privacy-Preserving General Edit Distance and Beyond , 2017 .

[19]  Abhi Shelat,et al.  Efficient Secure Computation with Garbled Circuits , 2011, ICISS.

[20]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[21]  Yehuda Lindell,et al.  SCAPI: The Secure Computation Application Programming Interface , 2012, IACR Cryptol. ePrint Arch..

[22]  Joan Feigenbaum,et al.  Secure multiparty computation of approximations , 2001, TALG.

[23]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[24]  Ran Canetti,et al.  Security and Composition of Multiparty Cryptographic Protocols , 2000, Journal of Cryptology.

[25]  Carl A. Gunter,et al.  Privacy in the Genomic Era , 2014, ACM Comput. Surv..

[26]  Mete Akgün,et al.  Privacy preserving processing of genomic data: A survey , 2015, J. Biomed. Informatics.

[27]  Vitaly Shmatikov,et al.  Towards Practical Privacy for Genomic Computation , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[28]  S. Rajsbaum Foundations of Cryptography , 2014 .