Optimized homomorphic encryption solution for secure genome-wide association studies

Genome-Wide Association Studies (GWAS) refer to observational studies of a genome-wide set of genetic variants across many individuals to see if any genetic variants are associated with a certain trait. A typical GWAS analysis of a disease phenotype involves iterative logistic regression of a case/control phenotype on a single-neuclotide polymorphism (SNP) with quantitative covariates. GWAS have been a highly successful approach for identifying genetic-variant associations with many poorly-understood diseases. However, a major limitation of GWAS is the dependence on individual-level genotype/phenotype data and the corresponding privacy concerns. We present a solution for secure GWAS using homomorphic encryption (HE) that keeps all individual data encrypted throughout the association study. Our solution is based on an optimized semi-parallel GWAS compute model, a new Residue-Number-System (RNS) variant of the Cheon-Kim-Kim-Song (CKKS) HE scheme, novel techniques to switch between data encodings, and more than a dozen crypto-engineering optimizations. Our prototype can perform the full GWAS computation for 1,000 individuals, 131,071 SNPs, and 3 covariates in about 10 minutes on a modern server computing node (with 28 cores). Our solution for a smaller dataset was awarded co-first place in iDASH’18 Track 2: “Secure Parallel Genome Wide Association Studies using HE”. Many of the HE optimizations presented in our paper are general-purpose, and can be used in solving challenging problems with large datasets in other application domains.

[1]  William H. Press,et al.  Numerical recipes in FORTRAN (2nd ed.): the art of scientific computing , 1992 .

[2]  Yingying Luo,et al.  Implication of genetic variants near SLC30A8, HHEX, CDKAL1, CDKN2A/B, IGF2BP2, FTO, TCF2, KCNQ1, and WFS1 in Type 2 Diabetes in a Chinese population , 2010, BMC Medical Genetics.

[3]  Jung Hee Cheon,et al.  Efficient Logistic Regression on Large Encrypted Data , 2018, IACR Cryptol. ePrint Arch..

[4]  Paul H. C. Eilers,et al.  GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies , 2013, BMC Bioinformatics.

[5]  Jung Hee Cheon,et al.  A Full RNS Variant of Approximate Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[6]  Jung Hee Cheon,et al.  Logistic regression model training based on the approximate homomorphic encryption , 2018, BMC Medical Genomics.

[7]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[8]  Xiaoqian Jiang,et al.  iDASH secure genome analysis competition 2017 , 2018, BMC Medical Genomics.

[9]  Craig Gentry,et al.  Homomorphic Evaluation of the AES Circuit , 2012, IACR Cryptol. ePrint Arch..

[10]  Daniele Micciancio,et al.  Semi-Parallel logistic regression for GWAS on encrypted data , 2020, BMC Medical Genomics.

[11]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[12]  Zhicong Huang,et al.  Logistic regression over encrypted data from fully homomorphic encryption , 2018, BMC Medical Genomics.

[13]  S. Juo,et al.  Matrix metalloproteinase-9 gene polymorphisms in nasal polyposis , 2010, BMC Medical Genetics.

[14]  Shai Halevi,et al.  Faster Homomorphic Linear Transformations in HElib , 2018, IACR Cryptol. ePrint Arch..

[15]  Shai Halevi,et al.  An Improved RNS Variant of the BFV Homomorphic Encryption Scheme , 2019, IACR Cryptol. ePrint Arch..

[16]  Satyanarayana V. Lokam,et al.  SECURITY OF HOMOMORPHIC ENCRYPTION , 2017 .

[17]  Julien Eynard,et al.  A Full RNS Variant of FV Like Somewhat Homomorphic Encryption Schemes , 2016, SAC.

[18]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..