Oblivious Multi-Party Machine Learning on Trusted Processors

Privacy-preserving multi-party machine learning allows multiple organizations to perform collaborative data analytics while guaranteeing the privacy of their individual datasets. Using trusted SGX-processors for this task yields high performance, but requires a careful selection, adaptation, and implementation of machine-learning algorithms to provably prevent the exploitation of any side channels induced by data-dependent access patterns. We propose data-oblivious machine learning algorithms for support vector machines, matrix factorization, neural networks, decision trees, and k-means clustering. We show that our efficient implementation based on Intel Skylake processors scales up to large, realistic datasets, with overheads several orders of magnitude lower than with previous approaches based on advanced cryptographic multi-party computation schemes.

[1]  Carlos V. Rozas,et al.  Innovative instructions and software model for isolated execution , 2013, HASP '13.

[2]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[3]  John Riedl,et al.  Application of Dimensionality Reduction in Recommender System - A Case Study , 2000 .

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  Somesh Jha,et al.  Privacy in Pharmacogenetics: An End-to-End Case Study of Personalized Warfarin Dosing , 2014, USENIX Security Symposium.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Angela Wiesberg Machine learning on encrypted data , 2018 .

[9]  Ittai Anati,et al.  Innovative Technology for CPU Based Attestation and Sealing , 2013 .

[10]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[11]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[12]  Michael Hicks,et al.  Wysteria: A Programming Language for Generic, Mixed-Mode Multiparty Computations , 2014, 2014 IEEE Symposium on Security and Privacy.

[13]  Sanjit A. Seshia,et al.  A design and verification methodology for secure isolated regions , 2016, PLDI.

[14]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[15]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[16]  Elisa Bertino,et al.  Query Processing , 1997, Multimedia Databases in Perspective.

[17]  Andrew Chi-Chih Yao,et al.  Protocols for Secure Computations (Extended Abstract) , 1982, FOCS.

[18]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[19]  Andrew W. Fitzgibbon,et al.  Secrets of Matrix Factorization: Approximations, Numerics, Manifold Optimization and Random Restarts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[21]  Gernot Heiser,et al.  Last-Level Cache Side-Channel Attacks are Practical , 2015, 2015 IEEE Symposium on Security and Privacy.

[22]  Christos Gkantsidis,et al.  VC3: Trustworthy Data Analytics in the Cloud Using SGX , 2015, 2015 IEEE Symposium on Security and Privacy.

[23]  Christos Gkantsidis,et al.  Observing and Preventing Leakage in MapReduce , 2015, CCS.

[24]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[25]  Taneli Mielikäinen,et al.  Cryptographically private support vector machines , 2006, KDD '06.

[26]  E. Szemerédi,et al.  O(n LOG n) SORTING NETWORK. , 1983 .

[27]  Yehuda Lindell,et al.  Secure Multiparty Computation for Privacy-Preserving Data Mining , 2009, IACR Cryptol. ePrint Arch..

[28]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29]  Michael Naehrig,et al.  ML Confidential: Machine Learning on Encrypted Data , 2012, ICISC.

[30]  Kartik Nayak,et al.  Oblivious Data Structures , 2014, IACR Cryptol. ePrint Arch..

[31]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[32]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[33]  Radu Sion,et al.  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data Confidentiality , 2011, IEEE Transactions on Knowledge and Data Engineering.

[34]  János Komlós,et al.  An 0(n log n) sorting network , 1983, STOC.

[35]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[36]  Marcus Peinado,et al.  Controlled-Channel Attacks: Deterministic Side Channels for Untrusted Operating Systems , 2015, 2015 IEEE Symposium on Security and Privacy.

[37]  Dr. P. Suresh Varma,et al.  A TRUSTED HARDWARE-BASED DATABASE WITH PRIVACY AND DATA CONFIDENTIALITY USING SECURE COPROCESSORS , 2015 .

[38]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[39]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[40]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[41]  Galen C. Hunt,et al.  Shielding Applications from an Untrusted Cloud with Haven , 2014, OSDI.

[42]  Ashay Rane,et al.  Raccoon: Closing Digital Side-Channels through Obfuscated Execution , 2015, USENIX Security Symposium.

[43]  Ahmad-Reza Sadeghi,et al.  Secure Evaluation of Private Linear Branching Programs with Medical Applications , 2009, ESORICS.

[44]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[45]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[46]  Ramarathnam Venkatesan,et al.  Orthogonal Security with Cipherbase , 2013, CIDR.

[47]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[48]  Benny Pinkas,et al.  Fairplay - Secure Two-Party Computation System , 2004, USENIX Security Symposium.

[49]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[50]  Shafi Goldwasser,et al.  Machine Learning Classification over Encrypted Data , 2015, NDSS.

[51]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[52]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[53]  Michael Naehrig,et al.  Privately Evaluating Decision Trees and Random Forests , 2016, IACR Cryptol. ePrint Arch..

[54]  Pengtao Xie,et al.  Crypto-Nets: Neural Networks over Encrypted Data , 2014, ArXiv.

[55]  Stratis Ioannidis,et al.  Privacy-preserving matrix factorization , 2013, CCS.

[56]  Samuel Madden,et al.  Processing Analytical Queries over Encrypted Data , 2013, Proc. VLDB Endow..

[57]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[58]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[59]  Rafail Ostrovsky,et al.  Software protection and simulation on oblivious RAMs , 1996, JACM.

[60]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[61]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[62]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[63]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[64]  Eli Upfal,et al.  The Melbourne Shuffle: Improving Oblivious Storage in the Cloud , 2014, ICALP.

[65]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[66]  Stratis Ioannidis,et al.  GraphSC: Parallel Secure Computation Made Easy , 2015, 2015 IEEE Symposium on Security and Privacy.

[67]  Juan del Cuvillo,et al.  Using innovative instructions to create trustworthy software solutions , 2013, HASP '13.

[68]  Ping Chen,et al.  Practical Secure Decision Tree Learning in a Teletreatment Application , 2014, Financial Cryptography.

[69]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[70]  Rafail Ostrovsky,et al.  On the (in)security of hash-based oblivious RAM and a new balancing scheme , 2012, SODA.

[71]  Raghav Kaushik,et al.  Oblivious Query Processing , 2013, ICDT.

[72]  Elaine Shi,et al.  Path ORAM: an extremely simple oblivious RAM protocol , 2012, CCS.

[73]  Andrew Chi-Chih Yao,et al.  How to Generate and Exchange Secrets (Extended Abstract) , 1986, FOCS.

[74]  Elaine Shi,et al.  GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation , 2015, ASPLOS.

[75]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[76]  Kartik Nayak,et al.  ObliVM: A Programming Framework for Secure Computation , 2015, 2015 IEEE Symposium on Security and Privacy.