Towards practical privacy-preserving genome-wide association study

BackgroundThe deployment of Genome-wide association studies (GWASs) requires genomic information of a large population to produce reliable results. This raises significant privacy concerns, making people hesitate to contribute their genetic information to such studies.ResultsWe propose two provably secure solutions to address this challenge: (1) a somewhat homomorphic encryption (HE) approach, and (2) a secure multiparty computation (MPC) approach. Unlike previous work, our approach does not rely on adding noise to the input data, nor does it reveal any information about the patients. Our protocols aim to prevent data breaches by calculating the χ2 statistic in a privacy-preserving manner, without revealing any information other than whether the statistic is significant or not. Specifically, our protocols compute the χ2 statistic, but only return a yes/no answer, indicating significance. By not revealing the statistic value itself but only the significance, our approach thwarts attacks exploiting statistic values. We significantly increased the efficiency of our HE protocols by introducing a new masking technique to perform the secure comparison that is necessary for determining significance.ConclusionsWe show that full-scale privacy-preserving GWAS is practical, as long as the statistics can be computed by low degree polynomials. Our implementations demonstrated that both approaches are efficient. The secure multiparty computation technique completes its execution in approximately 2 ms for data contributed by one million subjects.

[1]  Marcel Keller,et al.  Actively Secure OT Extension with Optimal Overhead , 2015, CRYPTO.

[2]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[3]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[4]  Oded Goldreich,et al.  A randomized protocol for signing contracts , 1985, CACM.

[5]  Michael Naehrig,et al.  Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme , 2013, IMACC.

[6]  Bradley Malin,et al.  Re-identification of Familial Database Records , 2006, AMIA.

[7]  Ivan Damgård,et al.  Confidential Benchmarking Based on Multiparty Computation , 2016, Financial Cryptography.

[8]  Srinivas Vivek,et al.  Fixed-Point Arithmetic in SHE Schemes , 2016, SAC.

[9]  Yuval Ishai,et al.  Extending Oblivious Transfers Efficiently , 2003, CRYPTO.

[10]  Kristin E. Lauter,et al.  Private genome analysis through homomorphic encryption , 2015, BMC Medical Informatics and Decision Making.

[11]  Yihua Zhang,et al.  Secure distributed genome analysis for GWAS and sequence comparison computation , 2015, BMC Medical Informatics and Decision Making.

[12]  Dan Bogdanov,et al.  A new way to protect privacy in large-scale genome-wide association studies , 2013, Bioinform..

[13]  David J. Wu,et al.  Secure genome-wide association analysis using multiparty computation , 2018, Nature Biotechnology.

[14]  Jun Sakuma,et al.  Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption , 2015, BMC Medical Informatics and Decision Making.

[15]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[16]  Rachel G Liao,et al.  A federated ecosystem for sharing genomic, clinical data , 2016, Science.

[17]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[18]  Bonnie Berger,et al.  Realizing privacy preserving genome-wide association studies , 2016, Bioinform..

[19]  Niv Gilboa,et al.  Two Party RSA Key Generation , 1999, CRYPTO.

[20]  Carl Bootland,et al.  Faster Homomorphic Function Evaluation using Non-Integral Base Encoding , 2017, IACR Cryptol. ePrint Arch..

[21]  Michael Naehrig,et al.  Private Computation on Encrypted Genomic Data , 2014, LATINCRYPT.

[22]  Tancrède Lepoint,et al.  NFLlib: NTT-Based Fast Lattice Library , 2016, CT-RSA.

[23]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[24]  Xiaoqian Jiang,et al.  Privacy-preserving GWAS analysis on federated genomic datasets , 2015, BMC Medical Informatics and Decision Making.

[25]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[26]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[27]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[28]  Yves Moreau,et al.  NGS-Logistics: federated analysis of NGS sequence variants across multiple locations , 2014, Genome Medicine.

[29]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[30]  Xiaoqian Jiang,et al.  FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption , 2015, BMC Medical Informatics and Decision Making.

[31]  Martin Lablans,et al.  A RESTful interface to pseudonymization services in modern web applications , 2015, BMC Medical Informatics and Decision Making.

[32]  Donald Beaver,et al.  Efficient Multiparty Protocols Using Circuit Randomization , 1991, CRYPTO.

[33]  Chung-Feng Liu,et al.  Exploring critical factors influencing physicians’ acceptance of mobile electronic medical records based on the dual-factor model: a validation in Taiwan , 2015, BMC Medical Informatics and Decision Making.

[34]  Stephen E. Fienberg,et al.  Privacy-Preserving Data Sharing for Genome-Wide Association Studies , 2012, J. Priv. Confidentiality.

[35]  Frederik Vercauteren,et al.  Privacy-Preserving Genome-Wide Association Study is Practical , 2017, IACR Cryptol. ePrint Arch..

[36]  Octavian Catrina,et al.  Improved Primitives for Secure Multiparty Integer Computation , 2010, SCN.

[37]  Marcel Keller,et al.  Practical Covertly Secure MPC for Dishonest Majority - Or: Breaking the SPDZ Limits , 2013, ESORICS.

[38]  Michael Naehrig,et al.  Manual for Using Homomorphic Encryption for Bioinformatics , 2017, Proceedings of the IEEE.

[39]  Haixu Tang,et al.  Learning your identity and disease from research papers: information leaks in genome wide association study , 2009, CCS.

[40]  Marcel Keller,et al.  MASCOT: Faster Malicious Arithmetic Secure Computation with Oblivious Transfer , 2016, IACR Cryptol. ePrint Arch..

[41]  Xiaoqian Jiang,et al.  SAFETY: Secure gwAs in Federated Environment through a hYbrid Solution , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.