Achieving GWAS with homomorphic encryption

One way of investigating how genes affect human traits would be with a genome-wide association study (GWAS). Genetic markers, known as single-nucleotide polymorphism (SNP), are used in GWAS. This raises privacy and security concerns as these genetic markers can be used to identify individuals uniquely. This problem is further exacerbated by a large number of SNPs needed, which produce reliable results at a higher risk of compromising the privacy of participants. We describe a method using homomorphic encryption (HE) to perform GWAS in a secure and private setting. This work is based on a proposed algorithm. Our solution mainly involves homomorphically encrypted matrix operations and suitable approximations that adapts the semi-parallel GWAS algorithm for HE. We leverage upon the complex space of the CKKS encryption scheme to increase the number of SNPs that can be packed within a ciphertext. We have also developed a cache module that manages ciphertexts, reducing the memory footprint. We have implemented our solution over two HE open source libraries, HEAAN and SEAL. Our best implementation took 24.70 minutes for a dataset with 245 samples, over 4 covariates and 10643 SNPs. We demonstrate that it is possible to achieve GWAS with homomorphic encryption with suitable approximations.

[1]  Zhicong Huang,et al.  Logistic regression over encrypted data from fully homomorphic encryption , 2018, BMC Medical Genomics.

[2]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[3]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[4]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[5]  Zhen Lin,et al.  Genomic Research and Human Subject Privacy , 2004, Science.

[6]  Jung Hee Cheon,et al.  Efficient Logistic Regression on Large Encrypted Data , 2018, IACR Cryptol. ePrint Arch..

[7]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[8]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[9]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[10]  Jung Hee Cheon,et al.  Logistic regression model training based on the approximate homomorphic encryption , 2018, BMC Medical Genomics.

[11]  James F. Epperson,et al.  An Introduction to Numerical Methods and Analysis , 2001 .

[12]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[13]  Tobias A. Knoch,et al.  GRIMP: a web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data , 2009, Bioinform..

[14]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[15]  Paul H. C. Eilers,et al.  GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies , 2013, BMC Bioinformatics.

[16]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[17]  Craig Gentry,et al.  Doing Real Work with FHE: The Case of Logistic Regression , 2018, IACR Cryptol. ePrint Arch..

[18]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[19]  Ronald L. Rivest,et al.  ON DATA BANKS AND PRIVACY HOMOMORPHISMS , 1978 .

[20]  Yang Wang,et al.  PrivLogit: Efficient Privacy-preserving Logistic Regression by Tailoring Numerical Optimizers , 2016, ArXiv.