CryptoML: Secure outsourcing of big data machine learning applications

We present CryptoML, the first practical framework for provably secure and efficient delegation of a wide range of contemporary matrix-based machine learning (ML) applications on massive datasets. In CryptoML a delegating client with memory and computational resource constraints wishes to assign the storage and ML-related computations to the cloud servers, while preserving the privacy of its data. We first suggest the dominant components of delegation performance cost, and create a matrix sketching technique that aims at minimizing the cost by data pre-processing. We then propose a novel interactive delegation protocol based on the provably secure Shamir's secret sharing. The protocol is customized for our new sketching technique to maximize the client's resource efficiency. CryptoML shows a new trade-off between the efficiency of secure delegation and the accuracy of the ML task. Proof of concept evaluations corroborate applicability of CryptoML to datasets with billions of non-zero records.

[1]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[3]  XiaoFeng Wang,et al.  Sedic: privacy-aware data intensive computing on hybrid clouds , 2011, CCS '11.

[4]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[5]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[6]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[7]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[8]  Mikhail J. Atallah,et al.  Securely outsourcing linear algebra computations , 2010, ASIACCS '10.

[9]  Ivan Damgård,et al.  Secure Distributed Linear Algebra in a Constant Number of Rounds , 2001, CRYPTO.

[10]  Stratis Ioannidis,et al.  Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[11]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[12]  Farinaz Koushanfar,et al.  RankMap: A Platform-Aware Framework for Distributed Learning from Dense Datasets , 2015, ArXiv.

[13]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[14]  Tao Wang,et al.  Deep learning with COTS HPC systems , 2013, ICML.

[15]  Bo Peng,et al.  Large-Scale Privacy-Preserving Mapping of Human Genomic Sequences on Hybrid Clouds , 2012, NDSS.

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[18]  Stratis Ioannidis,et al.  GraphSC: Parallel Secure Computation Made Easy , 2015, 2015 IEEE Symposium on Security and Privacy.

[19]  Radu Sion,et al.  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data Confidentiality , 2011, IEEE Transactions on Knowledge and Data Engineering.

[20]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[21]  R. Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.