Privacy-Preserving Genome-Wide Association Study is Practical

The deployment of Genome-wide association studies (GWASs) requires genomic information of a large population to produce reliable results. This raises significant privacy concerns, making people hesitate to contribute their genetic information to such studies. We propose two provably secure solutions to address this challenge: (1) a somewhat homomorphic encryption (HE) approach, and (2) a secure multiparty computation (MPC) approach. Unlike previous work, our approach does not rely on adding noise to the input data, nor does it reveal any information about the patients. Our protocols aim to prevent data breaches by calculating the χ statistic in a privacy-preserving manner, without revealing any information other than whether the statistic is significant or not. Specifically, our protocols compute the χ statistic, but only return a yes/no answer, indicating significance. By not revealing the statistic value itself but only the significance, our approach thwarts attacks exploiting statistic values. We significantly increased the efficiency of our HE protocols by introducing a new masking technique to perform the secure comparison that is necessary for determining significance. We show that full-scale privacy-preserving GWAS is practical, as long as the statistics can be computed by low degree polynomials. Our implementations demonstrated that both approaches are efficient. The secure multiparty computation technique completes its execution in approximately 2 ms for data contributed by one million subjects.

[1]  Kristin E. Lauter,et al.  Private genome analysis through homomorphic encryption , 2015, BMC Medical Informatics and Decision Making.

[2]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[3]  Ivan Damgård,et al.  Confidential Benchmarking Based on Multiparty Computation , 2016, Financial Cryptography.

[4]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[5]  Yves Moreau,et al.  NGS-Logistics: federated analysis of NGS sequence variants across multiple locations , 2014, Genome Medicine.

[6]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[7]  Srinivas Vivek,et al.  Fixed-Point Arithmetic in SHE Schemes , 2016, SAC.

[8]  Oded Goldreich,et al.  A randomized protocol for signing contracts , 1985, CACM.

[9]  Carl Bootland,et al.  Faster Homomorphic Function Evaluation using Non-Integral Base Encoding , 2017, IACR Cryptol. ePrint Arch..

[10]  Michael Naehrig,et al.  Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme , 2013, IMACC.

[11]  Vinod Vaikuntanathan,et al.  Can homomorphic encryption be practical? , 2011, CCSW '11.

[12]  Michael Naehrig,et al.  Manual for Using Homomorphic Encryption for Bioinformatics , 2017, Proceedings of the IEEE.

[13]  Yihua Zhang,et al.  Secure distributed genome analysis for GWAS and sequence comparison computation , 2015, BMC Medical Informatics and Decision Making.

[14]  David J. Wu,et al.  Secure genome-wide association analysis using multiparty computation , 2018, Nature Biotechnology.

[15]  Jun Sakuma,et al.  Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption , 2015, BMC Medical Informatics and Decision Making.

[16]  Dan Bogdanov,et al.  A new way to protect privacy in large-scale genome-wide association studies , 2013, Bioinform..

[17]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[18]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[19]  Bonnie Berger,et al.  Realizing privacy preserving genome-wide association studies , 2016, Bioinform..

[20]  Octavian Catrina,et al.  Improved Primitives for Secure Multiparty Integer Computation , 2010, SCN.

[21]  Xiaoqian Jiang,et al.  SAFETY: Secure gwAs in Federated Environment through a hYbrid Solution , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Marcel Keller,et al.  Actively Secure OT Extension with Optimal Overhead , 2015, CRYPTO.

[23]  Bradley Malin,et al.  Re-identification of Familial Database Records , 2006, AMIA.

[24]  Donald Beaver,et al.  Efficient Multiparty Protocols Using Circuit Randomization , 1991, CRYPTO.

[25]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[26]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[27]  Michael Naehrig,et al.  Private Computation on Encrypted Genomic Data , 2014, LATINCRYPT.

[28]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[29]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[30]  Tancrède Lepoint,et al.  NFLlib: NTT-Based Fast Lattice Library , 2016, CT-RSA.

[31]  Xiaoqian Jiang,et al.  Privacy-preserving GWAS analysis on federated genomic datasets , 2015, BMC Medical Informatics and Decision Making.

[32]  Marcel Keller,et al.  Practical Covertly Secure MPC for Dishonest Majority - Or: Breaking the SPDZ Limits , 2013, ESORICS.

[33]  Rachel G Liao,et al.  A federated ecosystem for sharing genomic, clinical data , 2016, Science.

[34]  Niv Gilboa,et al.  Two Party RSA Key Generation , 1999, CRYPTO.

[35]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[36]  Xiaoqian Jiang,et al.  FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption , 2015, BMC Medical Informatics and Decision Making.