Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation

Background Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. Objective The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). Methods We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Results Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. Conclusions We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still.

[1]  Susan A. Stickler,et al.  The Use of Logistic Regression in Diagnostic and Prognostic Prediction in a Medical Intensive Care Unit , 1980 .

[2]  E. Dietz Application of Logistic Regression and Logistic Discrimination in Medical Decision Making , 1987 .

[3]  Tjalling J. Ypma,et al.  Historical Development of the Newton-Raphson Method , 1995, SIAM Rev..

[4]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[5]  R. Harrison,et al.  Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. , 1996, European heart journal.

[6]  J. Habbema,et al.  Prognostic Modeling with Logistic Regression Analysis , 2001, Medical decision making : an international journal of the Society for Medical Decision Making.

[7]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[8]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[9]  Jihoon Kim,et al.  Grid Binary LOgistic REgression (GLORE): building shared models without sharing data , 2012, J. Am. Medical Informatics Assoc..

[10]  Craig Gentry,et al.  (Leveled) fully homomorphic encryption without bootstrapping , 2012, ITCS '12.

[11]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[12]  J. John The Definitive Guide to Complying with the HIPAA/HITECH Privacy and Security Rules , 2012 .

[13]  Craig Gentry,et al.  Homomorphic Evaluation of the AES Circuit , 2012, IACR Cryptol. ePrint Arch..

[14]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[15]  Michael Naehrig,et al.  Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme , 2013, IMACC.

[16]  Xiaoqian Jiang,et al.  WebGLORE: a Web service for Grid LOgistic REgression , 2013, Bioinform..

[17]  Xiaoqian Jiang,et al.  EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed privacy-preserving online model learning , 2013, J. Biomed. Informatics.

[18]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[19]  Jung Hee Cheon,et al.  Search-and-compute on Encrypted Data , 2015, IACR Cryptol. ePrint Arch..

[20]  Shai Halevi,et al.  Algorithms in HElib , 2014, CRYPTO.

[21]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[22]  Yoshinori Aono,et al.  Scalable and Secure Logistic Regression via Homomorphic Encryption , 2016, IACR Cryptol. ePrint Arch..

[23]  Jung Hee Cheon,et al.  Optimized Search-and-Compute Circuits and Their Application to Query Evaluation on Encrypted Data , 2016, IEEE Transactions on Information Forensics and Security.

[24]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[25]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[26]  Xiaoqian Jiang,et al.  Secure Logistic Regression based on Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[27]  Fei Wang,et al.  Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..