S3ML: A Secure Serving System for Machine Learning Inference

We present S3ML, a secure serving system for machine learning inference in this paper. S3ML runs machine learning models in Intel SGX enclaves to protect users' privacy. S3ML designs a secure key management service to construct flexible privacy-preserving server clusters and proposes novel SGX-aware load balancing and scaling methods to satisfy users' Service-Level Objectives. We have implemented S3ML based on Kubernetes as a low-overhead, high-available, and scalable system. We demonstrate the system performance and effectiveness of S3ML through extensive experiments on a series of widely-used models.

[1]  Ion Stoica,et al.  Opaque: An Oblivious and Encrypted Distributed Analytics Platform , 2017, NSDI.

[2]  Shweta Shinde,et al.  Panoply: Low-TCB Linux Applications With SGX Enclaves , 2017, NDSS.

[3]  Dan Boneh,et al.  Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware , 2018, ICLR.

[4]  Galen C. Hunt,et al.  Shielding Applications from an Untrusted Cloud with Haven , 2014, OSDI.

[5]  Rajeev Balasubramonian,et al.  VAULT: Reducing Paging Overheads in SGX with Efficient Integrity Verification Structures , 2018, ASPLOS.

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  Carl A. Gunter,et al.  Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX , 2017, CCS.

[8]  Jonathan M. Smith,et al.  USENIX Association , 2000 .

[9]  Shivaram Venkataraman,et al.  Parity models: erasure-coded resilience for prediction serving systems , 2019, SOSP.

[10]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.3 , 2018, RFC.

[11]  Wei Jin,et al.  USENIX Association Proceedings of USITS ’ 03 : 4 th USENIX Symposium on Internet Technologies and Systems , 2003 .

[12]  Xin Wang,et al.  Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[13]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[14]  Li Shuangfeng,et al.  TensorFlow Lite: On-Device Machine Learning Framework , 2020 .

[15]  Christof Fetzer,et al.  TensorSCONE: A Secure TensorFlow Framework using Intel SGX , 2019, ArXiv.

[16]  Thomas F. Wenisch,et al.  Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution , 2018, USENIX Security Symposium.

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Srinivas Devadas,et al.  Intel SGX Explained , 2016, IACR Cryptol. ePrint Arch..

[19]  Donald E. Porter,et al.  Graphene-SGX: A Practical Library OS for Unmodified Applications on SGX , 2017, USENIX Annual Technical Conference.

[20]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.1 , 2006, RFC.

[21]  David M. Eyers,et al.  SCONE: Secure Linux Containers with Intel SGX , 2016, OSDI.

[22]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[23]  Yubin Xia,et al.  Occlum: Secure and Efficient Multitasking Inside a Single Enclave of Intel SGX , 2020, ASPLOS.

[24]  Maurice Herlihy,et al.  Wait-free synchronization , 1991, TOPL.

[25]  沈中林,et al.  利用集群技术构建Linux Virtual Server , 2000 .

[26]  Yinqian Zhang,et al.  SgxPectre: Stealing Intel Secrets From SGX Enclaves via Speculative Execution , 2020, IEEE Security & Privacy.

[27]  Yuan Xiao,et al.  SgxPectre: Stealing Intel Secrets from SGX Enclaves Via Speculative Execution , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[28]  Valerio Schiavoni,et al.  Stress-SGX: Load and Stress your Enclaves for Fun and Profit , 2019, ArXiv.

[29]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[30]  Prashant J. Shenoy,et al.  Agile dynamic provisioning of multi-tier Internet applications , 2008, TAAS.

[31]  Christopher Olston,et al.  TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.

[32]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Christoforos E. Kozyrakis,et al.  Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[34]  Wei Wang,et al.  MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.

[35]  Tim Dierks,et al.  The Transport Layer Security (TLS) Protocol Version 1.2 , 2008 .

[36]  Beng Chin Ooi,et al.  Rafiki: Machine Learning as an Analytics Service System , 2018, Proc. VLDB Endow..

[37]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[38]  Kapil Vaswani,et al.  EnclaveDB: A Secure Database Using SGX , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[39]  Christos Gkantsidis,et al.  VC3: Trustworthy Data Analytics in the Cloud Using SGX , 2015, 2015 IEEE Symposium on Security and Privacy.

[40]  Moustafa Ghanem,et al.  Future Generation Computer Systems ( ) – Future Generation Computer Systems Enabling Cost-aware and Adaptive Elasticity of Multi-tier Cloud Applications , 2022 .

[41]  Sameh Elnikety,et al.  Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency , 2017, Middleware.