Logistic regression model training based on the approximate homomorphic encryption

BackgroundSecurity concerns have been raised since big data became a prominent tool in data analysis. For instance, many machine learning algorithms aim to generate prediction models using training data which contain sensitive information about individuals. Cryptography community is considering secure computation as a solution for privacy protection. In particular, practical requirements have triggered research on the efficiency of cryptographic primitives.MethodsThis paper presents a method to train a logistic regression model without information leakage. We apply the homomorphic encryption scheme of Cheon et al. (ASIACRYPT 2017) for an efficient arithmetic over real numbers, and devise a new encoding method to reduce storage of encrypted database. In addition, we adapt Nesterov’s accelerated gradient method to reduce the number of iterations as well as the computational cost while maintaining the quality of an output classifier.ResultsOur method shows a state-of-the-art performance of homomorphic encryption system in a real-world application. The submission based on this work was selected as the best solution of Track 3 at iDASH privacy and security competition 2017. For example, it took about six minutes to obtain a logistic regression model given the dataset consisting of 1579 samples, each of which has 18 features with a binary outcome variable.ConclusionsWe present a practical solution for outsourcing analysis tools such as logistic regression analysis while preserving the data confidentiality.

[1]  E. Dietz Application of Logistic Regression and Logistic Discrimination in Medical Decision Making , 1987 .

[2]  P. Warner Ordinal logistic regression , 2008, Journal of Family Planning and Reproductive Health Care.

[3]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[4]  Jung Hee Cheon,et al.  Bootstrapping for Approximate Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[5]  Xiaoqian Jiang,et al.  Secure Logistic Regression based on Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[6]  Sheng Wang,et al.  The Application of Deep Learning in Biomedical Informatics , 2018, 2018 International Conference on Robots & Intelligent System (ICRIS).

[7]  Rob Hall,et al.  Achieving Both Valid and Secure Logistic Regression Analysis on Aggregated Data from Different Private Sources , 2012, J. Priv. Confidentiality.

[8]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[9]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[10]  R. Harrison,et al.  Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: derivation and evaluation of logistic regression models. , 1996, European heart journal.

[11]  Yang Wang,et al.  PrivLogit: Efficient Privacy-preserving Logistic Regression by Tailoring Numerical Optimizers , 2016, ArXiv.

[12]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[13]  Tadanori Teruya,et al.  Privacy-preservation for Stochastic Gradient Descent Application to Secure Logistic Regression , 2013 .

[14]  E G Lowrie,et al.  Death risk in hemodialysis patients: the predictive value of commonly measured variables and an evaluation of death rate differences between facilities. , 1990, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[15]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[16]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[17]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[18]  Yoshinori Aono,et al.  Scalable and Secure Logistic Regression via Homomorphic Encryption , 2016, IACR Cryptol. ePrint Arch..

[19]  R. Davies,et al.  Logistic Regression Models in Sociological Research , 2009 .

[20]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[21]  Jose C Florez,et al.  Introduction to genetic association studies. , 2007, The Journal of investigative dermatology.

[22]  Murat Kantarcioglu,et al.  A secure distributed logistic regression protocol for the detection of rare adverse drug events , 2012, J. Am. Medical Informatics Assoc..

[23]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[24]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..