Empowering Data Centers for Next Generation Trusted Computing

Modern data centers have grown beyond CPU nodes to provide domain-specific accelerators such as GPUs and FPGAs to their customers. From a security standpoint, cloud customers want to protect their data. They are willing to pay additional costs for trusted execution environments such as enclaves provided by Intel SGX and AMD SEV. Unfortu-nately, the customers have to make a critical choice—either use domain-specific accelerators for speed or use CPU-based confidential computing solutions. To bridge this gap, we aim to enable data-center scale confidential computing that expands across CPUs and accelerators. We argue that having wide-scale TEE-support for accelerators presents a technically easier solution, but is far away from being a reality. Instead, our hybrid design provides enclaved execution guarantees for computation distributed over multiple CPU nodes and devices with/without TEE support. Our solution scales gracefully in two dimensions—it can handle a large number of heterogeneous nodes and it can accommo-date TEE-enabled devices as and when they are available in the future. We observe marginal overheads of 0 . 42–8% on real-world AI data center workloads that are independent of the number of nodes in the data center. We add custom TEE support to two accelerators (AI and storage) and integrate it into our solution, thus demonstrating that it can cater to future TEE devices.

[1]  L. Cavigelli,et al.  Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tiles , 2022, 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Dan C. Wilkinson,et al.  Confidential Machine Learning within Graphcore IPUs , 2022, ArXiv.

[3]  Jaehyuk Huh,et al.  TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing Unit , 2022, 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[4]  Mingyu Gao,et al.  ShEF: shielded enclaves for cloud FPGAs , 2021, ASPLOS.

[5]  Srdjan Capkun,et al.  Composite Enclaves: Towards Disaggregated Trusted Execution , 2020, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[6]  G. Suh,et al.  GuardNN: secure accelerator architecture for privacy-preserving deep learning , 2020, DAC.

[7]  Yinqian Zhang,et al.  MAGE: Mutual Attestation for a Group of Enclaves without Trusted Third Parties , 2020, USENIX Security Symposium.

[8]  G. Suh,et al.  MGX: near-zero overhead memory protection for data-intensive accelerators , 2020, ISCA.

[9]  Youngjin Kwon,et al.  Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing , 2022, USENIX Annual Technical Conference.

[10]  G. Beyer,et al.  3D SoC integration, beyond 2.5D chiplets , 2021, 2021 IEEE International Electron Devices Meeting (IEDM).

[11]  Xiaolin Xu,et al.  SGX-FPGA: Trusted Execution Environment for CPU-FPGA Heterogeneous Architecture , 2021, 2021 58th ACM/IEEE Design Automation Conference (DAC).

[12]  Bruce Jacob,et al.  IceClave: A Trusted Execution Environment for In-Storage Computing , 2021, MICRO.

[13]  Ahmad-Reza Sadeghi,et al.  Trusted Configuration in Cloud FPGAs , 2021, 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[14]  David J. Wu,et al.  CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU , 2021, 2021 IEEE Symposium on Security and Privacy (SP).

[15]  Yan Solihin,et al.  Analyzing Secure Memory Architecture for GPUs , 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[16]  Jiajin Tu,et al.  Ascend: a Scalable and Unified Architecture for Ubiquitous Deep Neural Network Computing : Industry Track Paper , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[17]  David W. Nellans,et al.  Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[18]  Jaehyuk Huh,et al.  Common Counters: Compressed Encryption Counters for Secure GPU Memory , 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).

[19]  Haoyu Wang,et al.  An optimization of im2col, an important method of CNNs, based on continuous address access , 2021, 2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE).

[20]  Ghada Dessouky,et al.  CURE: A Security Architecture with CUstomizable and Resilient Enclaves , 2020, USENIX Security Symposium.

[21]  S. Mangard,et al.  HECTOR-V: A Heterogeneous CPU Architecture for a Secure RISC-V Execution Environment , 2020, AsiaCCS.

[22]  Ohmin Kwon,et al.  ZeroKernel: Secure Context-Isolated Execution on Commodity GPUs , 2019, IEEE Transactions on Dependable and Secure Computing.

[23]  Sherman S. M. Chow,et al.  Goten: GPU-Outsourcing Trusted Execution of Neural Network Training , 2019, AAAI.

[24]  Yunheung Paek,et al.  MeetGo: A Trusted Execution Environment for Remote Applications on FPGA , 2021, IEEE Access.

[25]  Thomas F. Wenisch,et al.  1RMA: Re-envisioning Remote Memory Access for Multi-tenant Datacenters , 2020, SIGCOMM.

[26]  Bruce Jacob,et al.  DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator , 2020, IEEE Computer Architecture Letters.

[27]  Xiaofeng Wang,et al.  Enabling Rack-scale Confidential Computing using Heterogeneous Trusted Execution Environment , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[28]  Dawn Song,et al.  Keystone: an open framework for architecting trusted execution environments , 2020, EuroSys.

[29]  Jiajia Chen,et al.  Disaggregated Data Centers: Challenges and Trade-offs , 2020, IEEE Communications Magazine.

[30]  Shin-Yeh Tsai,et al.  Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores , 2020, USENIX ATC.

[31]  Zhipeng Jia,et al.  Telekine: Secure Computing with Cloud GPUs , 2020, NSDI.

[32]  Ten-Hwang Lai,et al.  OPERA: Open Remote Attestation for Intel's Secure Enclaves , 2019, CCS.

[33]  Jing Xia,et al.  DaVinci: A Scalable Architecture for Neural Network Computing , 2019, 2019 IEEE Hot Chips 31 Symposium (HCS).

[34]  Simha Sethumadhavan,et al.  Heterogeneous Isolated Execution for Commodity GPUs , 2019, ASPLOS.

[35]  Mark Silberstein,et al.  NICA: An Infrastructure for Inline Acceleration of Network Applications , 2019, USENIX Annual Technical Conference.

[36]  Rodrigo Bruno,et al.  Graviton: Trusted Execution Environments on GPUs , 2018, OSDI.

[37]  Kostas Katrinis,et al.  dReDBox: A Disaggregated Architectural Perspective for Data Centers , 2018, Hardware Accelerators in Data Centers.

[38]  Hassan Takabi,et al.  Privacy-preserving Machine Learning as a Service , 2018, Proc. Priv. Enhancing Technol..

[39]  Enhong Chen,et al.  Multi-Path Transport for RDMA in Datacenters , 2018, NSDI.

[40]  Ran Ginosar,et al.  PRINS: Processing-in-Storage Acceleration of Machine Learning , 2018, IEEE Transactions on Nanotechnology.

[41]  John Shalf,et al.  SimpleSSD: Modeling Solid State Drives for Holistic System Simulation , 2017, IEEE Computer Architecture Letters.

[42]  Jan Camenisch,et al.  One TPM to Bind Them All: Fixing TPM 2.0 for Provably Secure Anonymous Attestation , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[43]  Antony I. T. Rowstron,et al.  Understanding Rack-Scale Disaggregated Storage , 2017, HotStorage.

[44]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[45]  Margaret Martonosi,et al.  Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[46]  Sebastian Nowozin,et al.  Oblivious Multi-Party Machine Learning on Trusted Processors , 2016, USENIX Security Symposium.

[47]  Jianguo Wang,et al.  In-Storage Computing for Hadoop MapReduce Framework: Challenges and Possibilities , 2016 .

[48]  Kostas Katrinis,et al.  Rack-scale disaggregated cloud data centers: The dReDBox project vision , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[49]  Sang-Won Lee,et al.  In-storage processing of database scans and joins , 2016, Inf. Sci..

[50]  Eric Peeters,et al.  System-Level Tamper Protection Using MSP MCUs , 2016 .

[51]  Srinivas Devadas,et al.  Intel SGX Explained , 2016, IACR Cryptol. ePrint Arch..

[52]  Srinivas Devadas,et al.  Sanctum: Minimal Hardware Extensions for Strong Software Isolation , 2016, USENIX Security Symposium.

[53]  Ozcan Ozturk,et al.  Hardware accelerator design for data centers , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[54]  Xiang Zhang,et al.  Network function virtualization in the multi-tenant cloud , 2015, IEEE Network.

[55]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[56]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[57]  Hari Angepat,et al.  An FPGA-based In-Line Accelerator for Memcached , 2014, IEEE Computer Architecture Letters.

[58]  Jie Xu,et al.  Multi-tenancy in Cloud Computing , 2014, 2014 IEEE 8th International Symposium on Service Oriented System Engineering.

[59]  David J. DeWitt,et al.  Query processing on smart SSDs: opportunities and challenges , 2013, SIGMOD '13.

[60]  Hovav Shacham,et al.  Iago attacks: why the system call API is a bad untrusted RPC interface , 2013, ASPLOS '13.

[61]  Martin Margala,et al.  An FPGA memcached appliance , 2013, FPGA '13.

[62]  Peter Desnoyers,et al.  Active Flash: Out-of-core data analytics on flash storage , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[63]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[64]  Cyrille Artho,et al.  Memory deduplication as a threat to the guest OS , 2011, EUROSEC '11.

[65]  Kermin Fleming,et al.  Leap scratchpads: automatic memory and cache management for reconfigurable logic , 2010, FPGA '11.

[66]  Hsien-Hsin S. Lee,et al.  An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[67]  Claudio Soriente,et al.  On the difficulty of software-based attestation of embedded devices , 2009, CCS.

[68]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[69]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[70]  Johannes Winter,et al.  Trusted computing building blocks for embedded linux-based ARM trustzone platforms , 2008, STC '08.

[71]  Adrian Perrig,et al.  SAKE: Software attestation for key establishment in sensor networks , 2008, Ad Hoc Networks.

[72]  Sencun Zhu,et al.  Distributed Software-based Attestation for Node Compromise Detection in Sensor Networks , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[73]  Ahmad-Reza Sadeghi,et al.  A protocol for property-based attestation , 2006, STC '06.

[74]  Ahmad-Reza Sadeghi,et al.  Property-based attestation for computing platforms: caring about properties, not mechanisms , 2004, NSPW '04.

[75]  Pradeep K. Khosla,et al.  SWATT: softWare-based attestation for embedded devices , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[76]  Farid N. Najm,et al.  High-level area and power estimation for VLSI circuits , 1997, 1997 Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).