Privacy-preserving logistic regression training

BackgroundLogistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service.MethodsIn this paper, we propose an algorithm (and its implementation) to train a logistic regression model on a homomorphically encrypted dataset. The core of our algorithm consists of a new iterative method that can be seen as a simplified form of the fixed Hessian method, but with a much lower multiplicative complexity.ResultsWe test the new method on two interesting real life applications: the first application is in medicine and constructs a model to predict the probability for a patient to have cancer, given genomic data as input; the second application is in finance and the model predicts the probability of a credit card transaction to be fraudulent. The method produces accurate results for both applications, comparable to running standard algorithms on plaintext data.ConclusionsThis article introduces a new simple iterative algorithm to train a logistic regression model that is tailored to be applied on a homomorphically encrypted dataset. This algorithm can be used as a privacy-preserving technique to build a binary classification model and can be applied in a wide range of problems that can be modelled with logistic regression. Our implementation results show that our method can handle the large datasets used in logistic regression training.

[1]  Julien Eynard,et al.  A Full RNS Variant of FV Like Somewhat Homomorphic Encryption Schemes , 2016, SAC.

[2]  Chris Peikert,et al.  On Ideal Lattices and Learning with Errors over Rings , 2010, JACM.

[3]  Yoshinori Aono,et al.  Scalable and Secure Logistic Regression via Homomorphic Encryption , 2016, IACR Cryptol. ePrint Arch..

[4]  貞敏 井桁 新刊紹介・学会彙報 Bulletin de l'Academie des Sciences de l'URSS , 1939 .

[5]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[6]  Carl Bootland,et al.  Faster Homomorphic Function Evaluation using Non-Integral Base Encoding , 2017, IACR Cryptol. ePrint Arch..

[7]  Vinod Vaikuntanathan,et al.  Can homomorphic encryption be practical? , 2011, CCSW '11.

[8]  Xiaoqian Jiang,et al.  Secure Logistic Regression based on Homomorphic Encryption , 2018, IACR Cryptol. ePrint Arch..

[9]  Michael Naehrig,et al.  Private Predictive Analysis on Encrypted Medical Data , 2014, IACR Cryptol. ePrint Arch..

[10]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[11]  Wouter Castryck,et al.  Privacy-friendly Forecasting for the Smart Grid using Homomorphic Encryption and the Group Method of Data Handling , 2017, IACR Cryptol. ePrint Arch..

[12]  Yang Wang,et al.  PrivLogit: Efficient Privacy-preserving Logistic Regression by Tailoring Numerical Optimizers , 2016, ArXiv.

[13]  Wouter Castryck,et al.  Homomorphic SIM2D Operations: Single Instruction Much More Data , 2018, IACR Cryptol. ePrint Arch..

[14]  Jan Camenisch,et al.  Privacy for Distributed Databases via (Un)linkable Pseudonyms , 2017, IACR Cryptol. ePrint Arch..

[15]  Martin R. Albrecht,et al.  On the concrete hardness of Learning with Errors , 2015, J. Math. Cryptol..

[16]  Frederik Vercauteren,et al.  Somewhat Practical Fully Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[17]  Peng Wang,et al.  Ubiquitous Weak-key Classes of BRW-polynomial Function , 2018, IACR Cryptol. ePrint Arch..