SecureML: A System for Scalable Privacy-Preserving Machine Learning

Machine learning is widely used in practice to produce predictive models for applications such as image processing, speech and text recognition. These models are more accurate when trained on large amount of data collected from different sources. However, the massive data collection raises privacy concerns. In this paper, we present new and efficient protocols for privacy preserving machine learning for linear regression, logistic regression and neural network training using the stochastic gradient descent method. Our protocols fall in the two-server model where data owners distribute their private data among two non-colluding servers who train various models on the joint data using secure two-party computation (2PC). We develop new techniques to support secure arithmetic operations on shared decimal numbers, and propose MPC-friendly alternatives to non-linear functions such as sigmoid and softmax that are superior to prior work. We implement our system in C++. Our experiments validate that our protocols are several orders of magnitude faster than the state of the art implementations for privacy preserving linear and logistic regressions, and scale to millions of data samples with thousands of features. We also implement the first privacy preserving system for training neural networks.

[1]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[2]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[3]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[4]  Ran Canetti,et al.  Universally composable security: a new paradigm for cryptographic protocols , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[5]  Wenliang Du,et al.  Privacy-preserving cooperative scientific computations , 2001, Proceedings. 14th IEEE Computer Security Foundations Workshop, 2001..

[6]  Yuval Ishai,et al.  Extending Oblivious Transfers Efficiently , 2003, CRYPTO.

[7]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[8]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[9]  Yunghsiang Sam Han,et al.  Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification , 2004, SDM.

[10]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[11]  Rebecca N. Wright,et al.  Privacy-preserving distributed k-means clustering over arbitrarily partitioned data , 2005, KDD '05.

[12]  Jaideep Vaidya,et al.  Privacy-Preserving SVM Classification on Vertically Partitioned Data , 2006, PAKDD.

[13]  Aleksandra Slavkovic,et al.  "Secure" Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases , 2007 .

[14]  Z. Shang,et al.  Using Statistics and Spatial Data Mining to Study Land Cover in Wyoming :Can We Predict Vegetation Types from Environmental Variables? , 2007 .

[15]  Rafail Ostrovsky,et al.  Secure two-party k-means clustering , 2007, CCS '07.

[16]  Jaideep Vaidya,et al.  Knowledge and Information Systems , 2007 .

[17]  Vladimir Kolesnikov,et al.  Improved Garbled Circuit: Free XOR Gates and Applications , 2008, ICALP.

[18]  Yehuda Lindell,et al.  A Proof of Security of Yao’s Protocol for Two-Party Computation , 2009, Journal of Cryptology.

[19]  Ivan Damgård,et al.  Homomorphic encryption and secure comparison , 2008, Int. J. Appl. Cryptogr..

[20]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[21]  Brent Waters,et al.  A Framework for Efficient and Composable Oblivious Transfer , 2008, CRYPTO.

[22]  Mariana Raykova,et al.  Outsourcing Multi-Party Computation , 2011, IACR Cryptol. ePrint Arch..

[23]  Mihir Bellare,et al.  Foundations of garbled circuits , 2012, CCS.

[24]  Yehuda Lindell,et al.  More efficient oblivious transfer and extensions for faster secure computation , 2013, CCS.

[25]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[26]  Mihir Bellare,et al.  Efficient Garbling from a Fixed-Key Blockcipher , 2013, 2013 IEEE Symposium on Security and Privacy.

[27]  Stratis Ioannidis,et al.  Privacy-preserving matrix factorization , 2013, CCS.

[28]  Tadanori Teruya,et al.  Privacy-preservation for Stochastic Gradient Descent Application to Secure Logistic Regression , 2013 .

[29]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[30]  Michael Zohner,et al.  ABY - A Framework for Efficient Mixed-Protocol Secure Two-Party Computation , 2015, NDSS.

[31]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Stratis Ioannidis,et al.  GraphSC: Parallel Secure Computation Made Easy , 2015, 2015 IEEE Symposium on Security and Privacy.

[33]  Alex J. Malozemoff,et al.  Faster Two-Party Computation Secure Against Malicious Adversaries in the Single-Execution Setting , 2016, IACR Cryptol. ePrint Arch..

[34]  Mariana Raykova,et al.  Secure Linear Regression on Vertically Partitioned Datasets , 2016, IACR Cryptol. ePrint Arch..

[35]  Michael Naehrig,et al.  CryptoNets: applying neural networks to encrypted data with high throughput and accuracy , 2016, ICML 2016.

[36]  Yoshinori Aono,et al.  Scalable and Secure Logistic Regression via Homomorphic Encryption , 2016, IACR Cryptol. ePrint Arch..

[37]  Kim Laine,et al.  Secure Data Exchange: A Marketplace in the Cloud , 2016, IACR Cryptol. ePrint Arch..

[38]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[39]  R. Hardwarsing Stochastic Gradient Descent with Differentially Private Updates , 2018 .