Hodor: Intra-Process Isolation for High-Throughput Data Plane Libraries

As network, I/O, accelerator, and NVM devices capable of a million operations per second make their way into data centers, the software stack managing such devices has been shifting from implementations within the operating system kernel to more specialized kernel-bypass approaches. While the in-kernel approach guarantees safety and provides resource multiplexing, it imposes too much overhead on microsecondscale tasks. Kernel-bypass approaches improve throughput substantially but sacrifice safety and complicate resource management: if applications are mutually distrusting, then either each application must have exclusive access to its own device or else the device itself must implement resource management. This paper shows how to attain both safety and performance via intra-process isolation for data plane libraries. We propose protected libraries as a new OS abstraction which provides separate user-level protection domains for different services (e.g., network and in-memory database), with performance approaching that of unprotected kernel bypass. We also show how this new feature can be utilized to enable sharing of data plane libraries across distrusting applications. Our proposed solution uses Intel’s memory protection keys (PKU) in a safe way to change the permissions associated with subsets of a single address space. In addition, it uses hardware watchpoints to delay asynchronous event delivery and to guarantee independent failure of applications sharing a protected library. We show that our approach can efficiently protect highthroughput in-memory databases and user-space network stacks. Our implementation allows up to 2.3 million library entrances per second per core, outperforming both kernellevel protection and two alternative implementations that use system calls and Intel’s VMFUNC switching of user-level address spaces, respectively.

[1]  Will Dietz,et al.  Nested Kernel: An Operating System Architecture for Intra-Kernel Privilege Separation , 2015, ASPLOS.

[2]  Bennet S. Yee,et al.  Adapting Software Fault Isolation to Contemporary CPU Architectures , 2010, USENIX Security Symposium.

[3]  Yubin Xia,et al.  Reducing world switches in virtualized environment with flexible cross-world calls , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[4]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[5]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[6]  Edouard Bugnion,et al.  ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks , 2017, SOSP.

[7]  Carl A. Waldspurger,et al.  Speculative Buffer Overflows: Attacks and Defenses , 2018, ArXiv.

[8]  Bhavani M. Thuraisingham,et al.  Differentiating Code from Data in x86 Binaries , 2011, ECML/PKDD.

[9]  Christoforos E. Kozyrakis,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 335 Dune: Safe User-level Access to Privileged Cpu Features , 2022 .

[10]  Long Lu,et al.  Shreds: Fine-Grained Execution Units with Private Memory , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[11]  Yutao Liu,et al.  Thwarting Memory Disclosure with Efficient Hypervisor-enforced Intra-domain Isolation , 2015, CCS.

[12]  Xi Chen,et al.  No Need to Hide: Protecting Safe Regions on Commodity Hardware , 2017, EuroSys.

[13]  Dong Du,et al.  EPTI: Efficient Defence against Meltdown Attack for Unpatched VMs , 2018, USENIX Annual Technical Conference.

[14]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[15]  Mihai Budiu,et al.  Control-flow integrity principles, implementations, and applications , 2009, TSEC.

[16]  Martín Abadi,et al.  An Overview of the Singularity Project , 2005 .

[17]  Chris Hawblitzel,et al.  Safe to the last instruction: automated verification of a type-safe operating system , 2010, PLDI '10.

[18]  Peter G. Neumann,et al.  CHERI: A Hybrid Capability-System Architecture for Scalable Software Compartmentalization , 2015, 2015 IEEE Symposium on Security and Privacy.

[19]  Michael Hamburg,et al.  Meltdown: Reading Kernel Memory from User Space , 2018, USENIX Security Symposium.

[20]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[21]  Mark Handley,et al.  Wedge: Splitting Applications into Reduced-Privilege Compartments , 2008, NSDI.

[22]  Zhi Wang,et al.  HyperSafe: A Lightweight Approach to Provide Lifetime Hypervisor Control-Flow Integrity , 2010, 2010 IEEE Symposium on Security and Privacy.

[23]  Zeyu Mi,et al.  SkyBridge: Fast and Secure Inter-Process Communication for Microkernels , 2019, EuroSys.

[24]  G. Ramalingam,et al.  The undecidability of aliasing , 1994, TOPL.

[25]  Michael L. Scott,et al.  Disengaged scheduling for fair, protected access to fast computational accelerators , 2014, ASPLOS.

[26]  Peter Druschel,et al.  ERIM: Secure and Efficient In-process Isolation with Memory Protection Keys , 2018, ArXiv.

[27]  Derek Bruening,et al.  Efficient, transparent, and comprehensive runtime code manipulation , 2004 .

[28]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[29]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[30]  Vikram S. Adve,et al.  Memory Safety for Low-Level Software/Hardware Interactions , 2009, USENIX Security Symposium.

[31]  Michio Honda,et al.  StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs , 2016, USENIX Annual Technical Conference.

[32]  Terence Kelly,et al.  Dalí: A Periodically Persistent Hash Map , 2017, DISC.

[33]  Peter Druschel,et al.  Light-Weight Contexts: An OS Abstraction for Safety and Performance , 2016, OSDI.

[34]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[35]  Muli Ben-Yehuda,et al.  CODOMs: Protecting software with Code-centric memory Domains , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[36]  Byung-Gon Chun,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .

[37]  Martín Abadi,et al.  XFI: software guards for system address spaces , 2006, OSDI '06.

[38]  Frank Piessens,et al.  A Systematic Evaluation of Transient Execution Attacks and Defenses , 2018, USENIX Security Symposium.

[39]  Michael Hamburg,et al.  Spectre Attacks: Exploiting Speculative Execution , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[40]  Stefan Mangard,et al.  KASLR is Dead: Long Live KASLR , 2017, ESSoS.

[41]  Jerome H. Saltzer,et al.  The protection of information in computer systems , 1975, Proc. IEEE.

[42]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[43]  Patrick Th. Eugster,et al.  Enforcing Least Privilege Memory Views for Multithreaded Applications , 2016, CCS.