Stratus: Clouds with Microarchitectural Resource Management

The emerging next generation of cloud services like Granular and Serverless computing are pushing the boundaries of the current cloud infrastructure. In order to meet the performance objectives, researchers are now leveraging low-level microarchitectural resources in clouds. At the same time these resources are also a major source of security problems that can compromise the confidentiality and integrity of sensitive data in multi-tenant shared cloud infrastructures. The core of the problem is the lack of isolation due to the unsupervised sharing of microarchitectural resources across different performance and security boundaries. In this paper, we introduce Stratus clouds that treat the isolation on microarchitectural elements as the key design principle when allocating cloud resources. This isolation improves both performance and security, but at the cost of reducing resource utilization. Stratus captures this trade-off using a novel abstraction that we call isolation credit, and show how it can help both providers and tenants when allocating microarchitectural resources using Stratus's declarative interface. We conclude by discussing the challenges of realizing Stratus clouds today.

[1]  Nadav Amit,et al.  Optimizing the TLB Shootdown Algorithm with Page Access Tracking , 2017, USENIX Annual Technical Conference.

[2]  Eddy Caron,et al.  Microarchitecture-Aware Virtual Machine Placement under Information Leakage Constraints , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[3]  Herbert Bos,et al.  Throwhammer: Rowhammer Attacks over the Network and Defenses , 2018, USENIX ATC.

[4]  Sharad Malik,et al.  Declarative Infrastructure Configuration Synthesis and Debugging , 2008, Journal of Network and Systems Management.

[5]  Mingyu Chen,et al.  Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning , 2017, PPoPP.

[6]  Michael Schwarz,et al.  ConTExT: A Generic Approach for Mitigating Spectre , 2020, NDSS.

[7]  Christina Delimitrou,et al.  Bolt: I Know What You Did Last Summer... In The Cloud , 2017, ASPLOS.

[8]  Herbert Bos,et al.  Grand Pwning Unit: Accelerating Microarchitectural Attacks with the GPU , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[9]  Herbert Bos,et al.  ZebRAM: Comprehensive and Compatible Software Protection Against Rowhammer Attacks , 2018, OSDI.

[10]  Adrian Schüpbach,et al.  The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.

[11]  Jason Cong,et al.  INSIDER: Designing In-Storage Computing System for Emerging High-Performance Drive , 2019, USENIX Annual Technical Conference.

[12]  Michael Hamburg,et al.  Spectre Attacks: Exploiting Speculative Execution , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[13]  Martin Schwarzl,et al.  NetSpectre: Read Arbitrary Memory over Network , 2018, ESORICS.

[14]  Tom Chothia,et al.  Time Protection: The Missing OS Abstraction , 2018, EuroSys.

[15]  Nael B. Abu-Ghazaleh,et al.  SafeSpec: Banishing the Spectre of a Meltdown with Leakage-Free Speculation , 2018, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[16]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[17]  Anirudh Sivaraman,et al.  Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads , 2017, NSDI.

[18]  Guo Chen,et al.  Direct Universal Access: Making Data Center Resources Available to FPGA , 2019, NSDI.

[19]  Christoforos E. Kozyrakis,et al.  ReFlex: Remote Flash ≈ Local Flash , 2017, ASPLOS.

[20]  Yiyu Yao,et al.  Granular Computing , 2008 .

[21]  Nikolas Ioannou,et al.  From random block corruption to privilege escalation: A filesystem attack vector for rowhammer-like attacks , 2017, WOOT.

[22]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[23]  Cesar Pereida García,et al.  Port Contention for Fun and Profit , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[24]  Yuval Yarom,et al.  FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack , 2014, USENIX Security Symposium.

[25]  Andrew Warfield,et al.  Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage , 2017, NSDI.

[26]  Josep Torrellas,et al.  InvisiSpec: Making Speculative Execution Invisible in the Cache Hierarchy , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[27]  Christoforos E. Kozyrakis,et al.  From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers , 2019, USENIX Annual Technical Conference.

[28]  Adi Shamir,et al.  Cache Attacks and Countermeasures: The Case of AES , 2006, CT-RSA.

[29]  Yang Li,et al.  dCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service , 2018, EuroSys.

[30]  Yiying Zhang,et al.  LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation , 2018, OSDI.

[31]  Gabriel Antoniu,et al.  Tailwind: Fast and Atomic RDMA-based Replication , 2018, USENIX ATC.

[32]  Mengyuan Li,et al.  Peeking Behind the Curtains of Serverless Platforms , 2018, USENIX Annual Technical Conference.

[33]  Ying Ye,et al.  COLORIS: A dynamic cache partitioning system using page coloring , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[34]  Dejan S. Milojicic,et al.  OpenNebula: A Cloud Management Tool , 2011, IEEE Internet Computing.

[35]  Peter Druschel,et al.  Resource containers: a new facility for resource management in server systems , 1999, OSDI '99.

[36]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[37]  Andreas Haeberlen,et al.  The Synchronous Data Center , 2019, HotOS.

[38]  Youyou Lu,et al.  Extending the lifetime of flash-based storage through reducing write amplification from file systems , 2013, FAST.

[39]  Rastislav Bodík,et al.  Floem: A Programming System for NIC-Accelerated Network Applications , 2018, OSDI.

[40]  Thomas E. Anderson,et al.  Ingress Pipeline Queues Packet Buffer DMA PipelineDMA Egress Pipeline , 2015 .

[41]  Herbert Bos,et al.  Flip Feng Shui: Hammering a Needle in the Software Stack , 2016, USENIX Security Symposium.

[42]  Cristiano Giuffrida,et al.  ABSynthe: Automatic Blackbox Side-channel Synthesis on Commodity Microarchitectures , 2020, NDSS.

[43]  Jian Yang,et al.  Orion: A Distributed File System for Non-Volatile Main Memory and RDMA-Capable Networks , 2019, FAST.

[44]  Babak Falsafi,et al.  SMoTherSpectre: Exploiting Speculative Execution through Port Contention , 2019, CCS.

[45]  Mark Silberstein,et al.  Understanding The Security of Discrete GPUs , 2017, GPGPU@PPoPP.

[46]  Michael Hamburg,et al.  Meltdown: Reading Kernel Memory from User Space , 2018, USENIX Security Symposium.

[47]  Herbert Bos,et al.  : Practical Cache Attacks from the Network , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[48]  Matias Bjørling,et al.  Multi-Tenant I/O Isolation with Open-Channel SSDs , 2017 .

[49]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[50]  Gernot Heiser,et al.  CATalyst: Defeating last-level cache side channel attacks in cloud computing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[51]  Ramón Cáceres,et al.  Zanzibar: Google's Consistent, Global Authorization System , 2019, USENIX Annual Technical Conference.

[52]  Boon Thau Loo,et al.  Declarative automated cloud resource orchestration , 2011, SoCC.

[53]  Javier González,et al.  LightNVM: The Linux Open-Channel SSD Subsystem , 2017, FAST.

[54]  Gerald Q. Maguire,et al.  Make the Most out of Last Level Cache in Intel Processors , 2019, EuroSys.

[55]  Rodrigo Bruno,et al.  Graviton: Trusted Execution Environments on GPUs , 2018, OSDI.

[56]  Zhipeng Jia,et al.  Isolation and Beyond: Challenges for System Security , 2019, HotOS.

[57]  Justin Cappos,et al.  Rhizoma: A Runtime for Self-deploying, Self-managing Overlays , 2009, Middleware.

[58]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[59]  Galen C. Hunt,et al.  Helios: heterogeneous multiprocessing with satellite kernels , 2009, SOSP '09.

[60]  Cristiano Giuffrida,et al.  TRRespass: Exploiting the Many Sides of Target Row Refresh , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[61]  Xinxin Mei,et al.  Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[62]  Ion Stoica,et al.  Declarative routing: extensible routing with declarative queries , 2005, SIGCOMM '05.

[63]  Herbert Bos,et al.  Exploiting Correcting Codes: On the Effectiveness of ECC Memory Against Rowhammer Attacks , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[64]  A. T. Markettos,et al.  Through computer architecture, darkly , 2019, Commun. ACM.

[65]  Jaeyoung Do,et al.  Programmable solid-state storage in future cloud datacenters , 2019, Commun. ACM.

[66]  Daniel Gruss,et al.  Strong and Efficient Cache Side-Channel Protection using Hardware Transactional Memory , 2017, USENIX Security Symposium.

[67]  Yuan Xiao,et al.  One Bit Flips, One Cloud Flops: Cross-VM Row Hammer Attacks and Privilege Escalation , 2016, USENIX Security Symposium.

[68]  Srinivasan Seshan,et al.  Hyperloop: group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems , 2018, SIGCOMM.

[69]  Jaehyuk Huh,et al.  Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources , 2012, HotCloud.

[70]  David A. Patterson,et al.  A new golden age for computer architecture , 2019, Commun. ACM.

[71]  Mehdi Baradaran Tahoori,et al.  FPGAhammer: Remote Voltage Fault Attacks on Shared FPGAs, suitable for DFA on AES , 2018, IACR Trans. Cryptogr. Hardw. Embed. Syst..

[72]  Thomas F. Wenisch,et al.  Foreshadow: Extracting the Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution , 2018, USENIX Security Symposium.

[73]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[74]  Srinivasan Seshan,et al.  FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds , 2019, NSDI.

[75]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[76]  Wencong Xiao,et al.  Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads , 2019, USENIX Annual Technical Conference.

[77]  Theophilus A. Benson,et al.  In-Network Compute: Considered Armed and Dangerous , 2019, HotOS.

[78]  Muli Ben-Yehuda,et al.  Tapping into the fountain of CPUs: on operating system support for programmable devices , 2008, ASPLOS.

[79]  Hari Angepat,et al.  A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[80]  Thomas R. Gross,et al.  A Hybrid I/O Virtualization Framework for RDMA-capable Network Interfaces , 2015, VEE.

[81]  Jacob Nelson,et al.  When Should The Network Be The Computer? , 2019, HotOS.

[82]  Andrea C. Arpaci-Dusseau,et al.  SOCK: Rapid Task Provisioning with Serverless-Optimized Containers , 2018, USENIX Annual Technical Conference.

[83]  Rachata Ausavarungnirun,et al.  MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency , 2018, ASPLOS.

[84]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[85]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[86]  Minlan Yu,et al.  Condor: Better Topologies Through Declarative Design , 2015, Comput. Commun. Rev..

[87]  Hakim Weatherspoon,et al.  Shoal: A Network Architecture for Disaggregated Racks , 2019, NSDI.

[88]  Hui Wang,et al.  A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters , 2015, VEE 2015.

[89]  Mathias Payer,et al.  Pythia: Remote Oracles for the Masses , 2019, USENIX Security Symposium.

[90]  Kang G. Shin,et al.  Performance Isolation Anomalies in RDMA , 2017, KBNets@SIGCOMM.

[91]  Mark Silberstein,et al.  PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.

[92]  Panos Kalnis,et al.  In-Network Computation is a Dumb Idea Whose Time Has Come , 2017, HotNets.

[93]  David A. Patterson,et al.  Cloud Programming Simplified: A Berkeley View on Serverless Computing , 2019, ArXiv.

[94]  Ankit Singla,et al.  Happiness index: Right-sizing the cloud's tenant-provider interface , 2019, HotCloud.

[95]  Gustavo Alonso,et al.  Providing Multi-tenant Services with FPGAs: Case Study on a Key-Value Store , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[96]  Adrian L. Schüpbach,et al.  Tackling OS Complexity with Declarative Techniques , 2012 .

[97]  Animesh Trivedi,et al.  Unification of Temporary Storage in the NodeKernel Architecture , 2019, USENIX ATC.