论文信息 - S3ML: A Secure Serving System for Machine Learning Inference

S3ML: A Secure Serving System for Machine Learning Inference

We present S3ML, a secure serving system for machine learning inference in this paper. S3ML runs machine learning models in Intel SGX enclaves to protect users' privacy. S3ML designs a secure key management service to construct flexible privacy-preserving server clusters and proposes novel SGX-aware load balancing and scaling methods to satisfy users' Service-Level Objectives. We have implemented S3ML based on Kubernetes as a low-overhead, high-available, and scalable system. We demonstrate the system performance and effectiveness of S3ML through extensive experiments on a series of widely-used models.

[1] Ion Stoica,et al. Opaque: An Oblivious and Encrypted Distributed Analytics Platform , 2017, NSDI.

[2] Shweta Shinde,et al. Panoply: Low-TCB Linux Applications With SGX Enclaves , 2017, NDSS.

[3] Dan Boneh,et al. Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware , 2018, ICLR.

[4] Galen C. Hunt,et al. Shielding Applications from an Untrusted Cloud with Haven , 2014, OSDI.

[5] Rajeev Balasubramonian,et al. VAULT: Reducing Paging Overheads in SGX with Efficient Integrity Verification Structures , 2018, ASPLOS.

[6] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7] Carl A. Gunter,et al. Leaky Cauldron on the Dark Land: Understanding Memory Side-Channel Hazards in SGX , 2017, CCS.

[8] Jonathan M. Smith,et al. USENIX Association , 2000 .

[9] Shivaram Venkataraman,et al. Parity models: erasure-coded resilience for prediction serving systems , 2019, SOSP.

[10] Eric Rescorla,et al. The Transport Layer Security (TLS) Protocol Version 1.3 , 2018, RFC.

[11] Wei Jin,et al. USENIX Association Proceedings of USITS ’ 03 : 4 th USENIX Symposium on Internet Technologies and Systems , 2003 .

[12] Xin Wang,et al. Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.

[13] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[14] Li Shuangfeng,et al. TensorFlow Lite: On-Device Machine Learning Framework , 2020 .

[15] Christof Fetzer,et al. TensorSCONE: A Secure TensorFlow Framework using Intel SGX , 2019, ArXiv.

[16] Thomas F. Wenisch,et al. Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution , 2018, USENIX Security Symposium.

[17] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18] Srinivas Devadas,et al. Intel SGX Explained , 2016, IACR Cryptol. ePrint Arch..

[19] Donald E. Porter,et al. Graphene-SGX: A Practical Library OS for Unmodified Applications on SGX , 2017, USENIX Annual Technical Conference.

[20] Eric Rescorla,et al. The Transport Layer Security (TLS) Protocol Version 1.1 , 2006, RFC.

[21] David M. Eyers,et al. SCONE: Secure Linux Containers with Intel SGX , 2016, OSDI.

[22] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[23] Yubin Xia,et al. Occlum: Secure and Efficient Multitasking Inside a Single Enclave of Intel SGX , 2020, ASPLOS.

[24] Maurice Herlihy,et al. Wait-free synchronization , 1991, TOPL.

[25] 沈中林,et al. 利用集群技术构建Linux Virtual Server , 2000 .

[26] Yinqian Zhang,et al. SgxPectre: Stealing Intel Secrets From SGX Enclaves via Speculative Execution , 2020, IEEE Security & Privacy.

[27] Yuan Xiao,et al. SgxPectre: Stealing Intel Secrets from SGX Enclaves Via Speculative Execution , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[28] Valerio Schiavoni,et al. Stress-SGX: Load and Stress your Enclaves for Fun and Profit , 2019, ArXiv.

[29] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[30] Prashant J. Shenoy,et al. Agile dynamic provisioning of multi-tier Internet applications , 2008, TAAS.

[31] Christopher Olston,et al. TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.

[32] Bo Chen,et al. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[34] Wei Wang,et al. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving , 2019, USENIX Annual Technical Conference.

[35] Tim Dierks,et al. The Transport Layer Security (TLS) Protocol Version 1.2 , 2008 .

[36] Beng Chin Ooi,et al. Rafiki: Machine Learning as an Analytics Service System , 2018, Proc. VLDB Endow..

[37] Dawson R. Engler,et al. Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[38] Kapil Vaswani,et al. EnclaveDB: A Secure Database Using SGX , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[39] Christos Gkantsidis,et al. VC3: Trustworthy Data Analytics in the Cloud Using SGX , 2015, 2015 IEEE Symposium on Security and Privacy.

[40] Moustafa Ghanem,et al. Future Generation Computer Systems ( ) – Future Generation Computer Systems Enabling Cost-aware and Adaptive Elasticity of Multi-tier Cloud Applications , 2022 .

[41] Sameh Elnikety,et al. Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency , 2017, Middleware.