When Homomorphic Encryption Marries Secret Sharing: Secure Large-Scale Sparse Logistic Regression and Applications in Risk Control

Logistic Regression (LR) is the most widely used machine learning model in industry for its efficiency, robustness, and interpretability. Due to the problem of data isolation and the requirement of high model performance, many applications in industry call for building a secure and efficient LR model for multiple parties. Most existing work uses either Homomorphic Encryption (HE) or Secret Sharing (SS) to build secure LR. HE based methods can deal with high-dimensional sparse features, but they incur potential security risks. SS based methods have provable security, but they have efficiency issue under high-dimensional sparse features. In this paper, we first present CAESAR, which combines HE and SS to build secure large-scale sparse logistic regression model and achieves both efficiency and security. We then present the distributed implementation of CAESAR for scalability requirement. We have deployed CAESAR in a risk control task and conducted comprehensive experiments. Our experimental results show that CAESAR improves the state-of-the-art model by around 130 times.

[1]  Zhicong Huang,et al.  Logistic regression over encrypted data from fully homomorphic encryption , 2018, BMC Medical Genomics.

[2]  Richard Nock,et al.  Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , 2017, ArXiv.

[3]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[4]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[5]  Michael Zohner,et al.  ABY - A Framework for Efficient Mixed-Protocol Secure Two-Party Computation , 2015, NDSS.

[6]  Chao Li,et al.  CryptoNN: Training Neural Networks over Encrypted Data , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[7]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[8]  Li Wang,et al.  Industrial Scale Privacy Preserving Deep Neural Network , 2020, ArXiv.

[9]  Hassan Takabi,et al.  CryptoDL: Deep Neural Networks over Encrypted Data , 2017, ArXiv.

[10]  Yi Li,et al.  PrivPy: Enabling Scalable and General Privacy-Preserving Machine Learning , 2018 .

[11]  Xiaoqian Jiang,et al.  SecureLR: Secure Logistic Regression Model via a Hybrid Cryptographic Protocol , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[13]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[14]  Deepsecure , 2018, Proceedings of the 55th Annual Design Automation Conference.

[15]  Tadanori Teruya,et al.  Privacy-preservation for Stochastic Gradient Descent Application to Secure Logistic Regression , 2013 .

[16]  Xiaoqian Jiang,et al.  Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation , 2018, IACR Cryptol. ePrint Arch..

[17]  Donald Beaver,et al.  Efficient Multiparty Protocols Using Circuit Randomization , 1991, CRYPTO.

[18]  Ji Feng,et al.  Distributed Deep Forest and its Application to Automatic Detection of Cash-Out Fraud , 2018, ACM Trans. Intell. Syst. Technol..

[19]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[20]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[21]  Peter Rindal,et al.  ABY3: A Mixed Protocol Framework for Machine Learning , 2018, IACR Cryptol. ePrint Arch..

[22]  Liang Li,et al.  Secure Social Recommendation based on Secret Sharing , 2020, ECAI.

[23]  Anderson C. A. Nascimento,et al.  Efficient and Private Scoring of Decision Trees, Support Vector Machines and Logistic Regression Models Based on Pre-Computation , 2019, IEEE Transactions on Dependable and Secure Computing.

[24]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[25]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[26]  Li Wang,et al.  A Hybrid-Domain Framework for Secure Gradient Tree Boosting , 2020, ArXiv.

[27]  Jie Lin,et al.  The AlexNet Moment for Homomorphic Encryption: HCNN, the First Homomorphic CNN on Encrypted Data with GPUs , 2018, IACR Cryptol. ePrint Arch..

[28]  Yehuda Lindell,et al.  Efficient Secure Two-Party Protocols: Techniques and Constructions , 2010 .

[29]  Matt J. Kusner,et al.  QUOTIENT: Two-Party Secure Neural Network Training and Prediction , 2019, CCS.

[30]  Yuval Ishai,et al.  Function Secret Sharing , 2015, EUROCRYPT.

[31]  Lei Wang,et al.  Secret Sharing based Secure Regressions with Applications , 2020, ArXiv.

[32]  Yang Wang,et al.  PrivLogit: Efficient Privacy-preserving Logistic Regression by Tailoring Numerical Optimizers , 2016, ArXiv.

[33]  Xiaoqian Jiang,et al.  Secure Multi-pArty Computation Grid LOgistic REgression (SMAC-GLORE) , 2016, BMC Medical Informatics and Decision Making.

[34]  Jung Hee Cheon,et al.  Logistic Regression on Homomorphic Encrypted Data at Scale , 2019, AAAI.

[35]  Sameer Wagh,et al.  SecureNN: Efficient and Private Neural Network Training , 2018, IACR Cryptol. ePrint Arch..

[36]  Mariana Raykova,et al.  Privacy-Preserving Distributed Linear Regression on High-Dimensional Data , 2017, Proc. Priv. Enhancing Technol..

[37]  Xu Chen,et al.  KunPeng: Parameter Server based Distributed Learning Systems and Its Applications in Alibaba and Ant Financial , 2017, KDD.

[38]  S. Fienberg,et al.  Secure multiple linear regression based on homomorphic encryption , 2011 .

[39]  Yoshinori Aono,et al.  Scalable and Secure Logistic Regression via Homomorphic Encryption , 2016, IACR Cryptol. ePrint Arch..

[40]  Mariana Raykova,et al.  Secure Computation for Machine Learning With SPDZ , 2019, ArXiv.

[41]  Farinaz Koushanfar,et al.  DeepSecure: Scalable Provably-Secure Deep Learning , 2017, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[42]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[43]  Benny Pinkas,et al.  Make Some ROOM for the Zeros: Data Sparsity in Secure Distributed Machine Learning , 2019, IACR Cryptol. ePrint Arch..

[44]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[45]  Jonathan H. Chen,et al.  Machine Learning and Prediction in Medicine - Beyond the Peak of Inflated Expectations. , 2017, The New England journal of medicine.

[46]  Benny Pinkas,et al.  Faster Private Set Intersection Based on OT Extension , 2014, USENIX Security Symposium.

[47]  Tatsuaki Okamoto,et al.  A New Public-Key Cryptosystem as Secure as Factoring , 1998, EUROCRYPT.

[48]  S. Rajsbaum Foundations of Cryptography , 2014 .

[49]  Anantha Chandrakasan,et al.  Gazelle: A Low Latency Framework for Secure Neural Network Inference , 2018, IACR Cryptol. ePrint Arch..

[50]  Zhicong Huang,et al.  Quantification of the Leakage in Federated Learning , 2019, ArXiv.

[51]  Ivan Damgård,et al.  Asynchronous Multiparty Computation: Theory and Implementation , 2008, IACR Cryptol. ePrint Arch..

[52]  Shucheng Yu,et al.  Privacy Preserving Back-Propagation Neural Network Learning Made Practical with Cloud Computing , 2014, IEEE Transactions on Parallel and Distributed Systems.

[53]  Louis J. M. Aslett,et al.  Encrypted Accelerated Least Squares Regression , 2017, AISTATS.

[54]  Mauro Conti,et al.  A Survey on Homomorphic Encryption Schemes , 2017, ACM Comput. Surv..