Faster Secure Data Mining via Distributed Homomorphic Encryption

Due to the rising privacy demand in data mining, Homomorphic Encryption (HE) is receiving more and more attention recently for its capability to do computations over the encrypted field. By using the HE technique, it is possible to securely outsource model learning to the not fully trustful but powerful public cloud computing environments. However, HE-based training scales badly because of the high computation complexity. It is still an open problem whether it is possible to apply HE to large-scale problems. In this paper, we propose a novel general distributed HE-based data mining framework towards one step of solving the scaling problem. The main idea of our approach is to use the slightly more communication overhead in exchange of shallower computational circuit in HE, so as to reduce the overall complexity. We verify the efficiency and effectiveness of our new framework by testing over various data mining algorithms and benchmark data-sets. For example, we successfully train a logistic regression model to recognize the digit 3 and 8 within around 5 minutes, while a centralized counterpart needs almost 2 hours.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[3]  Yoshinori Aono,et al.  Scalable and Secure Logistic Regression via Homomorphic Encryption , 2016, IACR Cryptol. ePrint Arch..

[4]  Craig Gentry,et al.  Doing Real Work with FHE: The Case of Logistic Regression , 2018, IACR Cryptol. ePrint Arch..

[5]  Craig Gentry,et al.  (Leveled) fully homomorphic encryption without bootstrapping , 2012, ITCS '12.

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[8]  B. Barak Fully Homomorphic Encryption and Post Quantum Cryptography , 2010 .

[9]  Jung Hee Cheon,et al.  Efficient Logistic Regression on Large Encrypted Data , 2018, IACR Cryptol. ePrint Arch..

[10]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[11]  Yun Yang,et al.  Comparison and Modelling of Country-level Microblog User and Activity in Cyber-physical-social Systems Using Weibo and Twitter Data , 2019, ACM Trans. Intell. Syst. Technol..

[12]  Masaya Yasuda,et al.  Fast secure matrix multiplications over ring-based homomorphic encryption , 2020, IACR Cryptol. ePrint Arch..

[13]  Jung Hee Cheon,et al.  Ensemble Method for Privacy-Preserving Logistic Regression Based on Homomorphic Encryption , 2018, IEEE Access.

[14]  Richard L. Graham,et al.  Open MPI: A Flexible High Performance MPI , 2005, PPAM.

[15]  Frederik Vercauteren,et al.  Privacy-preserving logistic regression training , 2018, BMC Medical Genomics.

[16]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[17]  Frederik Armknecht,et al.  Unsupervised Machine Learning on Encrypted Data , 2018, IACR Cryptol. ePrint Arch..

[18]  Arabi Keshk,et al.  Homomorphic encryption the “Holy Grail” of cryptography , 2016, 2016 2nd IEEE International Conference on Computer and Communications (ICCC).

[19]  Ratnakumari Challa Homomorphic Encryption: Review and Applications , 2020 .

[20]  Carlos V. Rozas,et al.  Innovative instructions and software model for isolated execution , 2013, HASP '13.

[21]  Vinod Vaikuntanathan,et al.  Efficient Fully Homomorphic Encryption from (Standard) LWE , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[22]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[23]  Craig Gentry,et al.  Fully Homomorphic Encryption over the Integers , 2010, EUROCRYPT.

[24]  Jung Hee Cheon,et al.  Homomorphic Encryption for Arithmetic of Approximate Numbers , 2017, ASIACRYPT.

[25]  Qiang Yang,et al.  SecureBoost: A Lossless Federated Learning Framework , 2019, IEEE Intelligent Systems.

[26]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[27]  Jung Hee Cheon,et al.  Towards a Practical Cluster Analysis over Encrypted Data , 2019, SAC.

[28]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[29]  Vinod Vaikuntanathan,et al.  Fully Homomorphic Encryption from Ring-LWE and Security for Key Dependent Messages , 2011, CRYPTO.

[30]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[31]  Nicolas Gama,et al.  Privacy-preserving semi-parallel logistic regression training with fully homomorphic encryption , 2020, BMC Medical Genomics.

[32]  Ruby B. Lee,et al.  Scalable architectural support for trusted software , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[33]  Richard Nock,et al.  Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption , 2017, ArXiv.

[34]  Anantha Chandrakasan,et al.  Gazelle: A Low Latency Framework for Secure Neural Network Inference , 2018, IACR Cryptol. ePrint Arch..

[35]  Craig Gentry,et al.  A fully homomorphic encryption scheme , 2009 .

[36]  Ran Gilad-Bachrach,et al.  Low Latency Privacy Preserving Inference , 2018, ICML.

[37]  Xiaoqian Jiang,et al.  Secure Outsourced Matrix Computation and Application to Neural Networks , 2018, CCS.

[38]  REGULATION (EU) 2019/518 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL , 2015 .

[39]  Vinod Vaikuntanathan,et al.  Efficient Fully Homomorphic Encryption from (Standard) LWE , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[40]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..