Semi-Parallel logistic regression for GWAS on encrypted data

The sharing of biomedical data is crucial to enable scientific discoveries across institutions and improve health care. For example, genome-wide association studies (GWAS) based on a large number of samples can identify disease-causing genetic variants. The privacy concern, however, has become a major hurdle for data management and utilization. Homomorphic encryption is one of the most powerful cryptographic primitives which can address the privacy and security issues. It supports the computation on encrypted data, so that we can aggregate data and perform an arbitrary computation on an untrusted cloud environment without the leakage of sensitive information. This paper presents a secure outsourcing solution to assess logistic regression models for quantitative traits to test their associations with genotypes. We adapt the semi-parallel training method by Sikorska et al., which builds a logistic regression model for covariates, followed by one-step parallelizable regressions on all individual single nucleotide polymorphisms (SNPs). In addition, we modify our underlying approximate homomorphic encryption scheme for performance improvement. We evaluated the performance of our solution through experiments on real-world dataset. It achieves the best performance of homomorphic encryption system for GWAS analysis in terms of both complexity and accuracy. For example, given a dataset consisting of 245 samples, each of which has 10643 SNPs and 3 covariates, our algorithm takes about 43 seconds to perform logistic regression based genome wide association analysis over encryption. We demonstrate the feasibility and scalability of our solution.

[1]  Paul H. C. Eilers,et al.  GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies , 2013, BMC Bioinformatics.

[2]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[3]  Christian P. Robert,et al.  Machine Learning, a Probabilistic Perspective , 2014 .

[4]  Peter Szolovits,et al.  ICU Acuity: Real-time Models versus Daily Models , 2009, AMIA.

[5]  Strother H. Walker,et al.  Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.

[6]  J. John The Definitive Guide to Complying with the HIPAA/HITECH Privacy and Security Rules , 2012 .

[7]  Satyanarayana V. Lokam,et al.  SECURITY OF HOMOMORPHIC ENCRYPTION , 2017 .

[8]  Julien Eynard,et al.  A Full RNS Variant of FV Like Somewhat Homomorphic Encryption Schemes , 2016, SAC.

[9]  G. Church,et al.  The Personal Genome Project , 2005, Molecular systems biology.

[10]  J. Cornfield,et al.  A multivariate analysis of the risk of coronary heart disease in Framingham. , 1967, Journal of chronic diseases.

[11]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[12]  Andrey A. Shabalin,et al.  Matrix eQTL: ultra fast eQTL analysis via large matrix operations , 2011, Bioinform..

[13]  G. Jung,et al.  Model for end-stage liver disease , 2008, Der Chirurg.

[14]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[15]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[16]  Kristin E. Lauter,et al.  Private genome analysis through homomorphic encryption , 2015, BMC Medical Informatics and Decision Making.

[17]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[18]  Craig Gentry,et al.  Homomorphic Evaluation of the AES Circuit , 2012, IACR Cryptol. ePrint Arch..

[19]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[20]  Jung Hee Cheon,et al.  A Full RNS Variant of Approximate Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[21]  Jung Hee Cheon,et al.  Logistic regression model training based on the approximate homomorphic encryption , 2018, BMC Medical Genomics.

[22]  Jung Hee Cheon,et al.  Optimized Search-and-Compute Circuits and Their Application to Query Evaluation on Encrypted Data , 2016, IEEE Transactions on Information Forensics and Security.

[23]  Alison Bowes,et al.  Early telemedicine training and counselling after hospitalization in patients with severe chronic obstructive pulmonary disease: a feasibility study , 2015, BMC Medical Informatics and Decision Making.

[24]  Zhicong Huang,et al.  Logistic regression over encrypted data from fully homomorphic encryption , 2018, BMC Medical Genomics.

[25]  Jung Hee Cheon,et al.  Secure searching of biomarkers through hybrid homomorphic encryption scheme , 2017, BMC Medical Genomics.

[26]  Vinod Vaikuntanathan,et al.  Fully Homomorphic Encryption from Ring-LWE and Security for Key Dependent Messages , 2011, CRYPTO.

[27]  Jung Hee Cheon,et al.  Search-and-compute on Encrypted Data , 2015, IACR Cryptol. ePrint Arch..

[28]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[29]  Frederik Vercauteren,et al.  Towards practical privacy-preserving genome-wide association study , 2018, BMC Bioinformatics.

[30]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[31]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[32]  Jung Hee Cheon,et al.  Homomorphic Computation of Edit Distance , 2015, IACR Cryptol. ePrint Arch..