论文信息 - CryptoML: Secure outsourcing of big data machine learning applications

CryptoML: Secure outsourcing of big data machine learning applications

We present CryptoML, the first practical framework for provably secure and efficient delegation of a wide range of contemporary matrix-based machine learning (ML) applications on massive datasets. In CryptoML a delegating client with memory and computational resource constraints wishes to assign the storage and ML-related computations to the cloud servers, while preserving the privacy of its data. We first suggest the dominant components of delegation performance cost, and create a matrix sketching technique that aims at minimizing the cost by data pre-processing. We then propose a novel interactive delegation protocol based on the provably secure Shamir's secret sharing. The protocol is customized for our new sketching technique to maximize the client's resource efficiency. CryptoML shows a new trade-off between the efficiency of secure delegation and the accuracy of the ML task. Proof of concept evaluations corroborate applicability of CryptoML to datasets with billions of non-zero records.

[1] Jitendra Malik,et al. Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Michael Naehrig,et al. ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[3] XiaoFeng Wang,et al. Sedic: privacy-aware data intensive computing on hybrid clouds , 2011, CCS '11.

[4] Shafi Goldwasser,et al. Machine Learning Classification over Encrypted Data , 2015, NDSS.

[5] Andrew Chi-Chih Yao,et al. Protocols for secure computations , 1982, FOCS 1982.

[6] Michael W. Mahoney,et al. Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[7] Mário A. T. Figueiredo,et al. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[8] Mikhail J. Atallah,et al. Securely outsourcing linear algebra computations , 2010, ASIACCS '10.

[9] Ivan Damgård,et al. Secure Distributed Linear Algebra in a Constant Number of Rounds , 2001, CRYPTO.

[10] Stratis Ioannidis,et al. Privacy-Preserving Ridge Regression on Hundreds of Millions of Records , 2013, 2013 IEEE Symposium on Security and Privacy.

[11] Craig Gentry,et al. Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[12] Farinaz Koushanfar,et al. RankMap: A Platform-Aware Framework for Distributed Learning from Dense Datasets , 2015, ArXiv.

[13] Michael C. Ferris,et al. Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[14] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.

[15] Bo Peng,et al. Large-Scale Privacy-Preserving Mapping of Human Genomic Sequences on Hybrid Clouds , 2012, NDSS.

[16] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[17] Y. C. Pati,et al. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[18] Stratis Ioannidis,et al. GraphSC: Parallel Secure Computation Made Easy , 2015, 2015 IEEE Symposium on Security and Privacy.

[19] Radu Sion,et al. TrustedDB: A Trusted Hardware-Based Database with Privacy and Data Confidentiality , 2011, IEEE Transactions on Knowledge and Data Engineering.

[20] Yurii Nesterov,et al. Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[21] R. Vidal,et al. Sparse Subspace Clustering: Algorithm, Theory, and Applications. , 2013, IEEE transactions on pattern analysis and machine intelligence.