I'm Not Dead Yet!: The Role of the Operating System in a Kernel-Bypass Era

Researchers have long predicted the demise of the operating system [21, 26, 41]. As datacenter servers increasingly incorporate I/O devices that let applications bypass the OS kernel (e.g., RDMA [12] and DPDK [15] network devices or SPDK storage devices), this prediction may finally come true. While kernel-bypass devices do eliminate the OS kernel from the I/O path, they do not handle the kernel's most important job: offering higher-level abstractions. This paper argues for a new high-level, device-agnostic I/O abstraction for kernel-bypass devices. We propose the Demikernel, a new library OS architecture for kernel-bypass devices. It defines a high-level, kernel-bypass I/O abstraction and provides user-space library OSes to implement that abstraction across a range of kernel-bypass devices. The Demikernel makes applications easier to build, portable across devices, and unmodified as devices continue to evolve.

[1]  Scott Shenker,et al.  Revisiting network support for RDMA , 2018, SIGCOMM.

[2]  Michio Honda,et al.  PASTE: A Network Programming Interface for Non-Volatile Main Memory , 2018, NSDI.

[3]  Christoforos E. Kozyrakis,et al.  Shinjuku: Preemptive Scheduling for μsecond-scale Tail Latency , 2019, NSDI.

[4]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[5]  Kang Chen,et al.  RFP: When RPC is Faster than Server-Bypass with RDMA , 2017, EuroSys.

[6]  Robin Fairbairns,et al.  The Design and Implementation of an Operating System to Support Distributed Multimedia Applications , 1996, IEEE J. Sel. Areas Commun..

[7]  Jeffrey C. Mogul,et al.  TCP Offload Is a Dumb Idea Whose Time Has Come , 2003, HotOS.

[8]  Greg Kroah-Hartman,et al.  Linux device drivers - where the Kernel meets the hardware (3. ed.) , 2005 .

[9]  Andy Currid,et al.  TCP Offload to the Rescue , 2004, ACM Queue.

[10]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[11]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[12]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[13]  Srinivasan Seshan,et al.  Hyperloop: group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems , 2018, SIGCOMM.

[14]  Greg Kroah-Hartman,et al.  Linux Device Drivers, 3rd Edition , 2005 .

[15]  Steven McCanne,et al.  The BSD Packet Filter: A New Architecture for User-level Packet Capture , 1993, USENIX Winter.

[16]  Dawson R. Engler,et al.  Exokernel: an operating system architecture for application-level resource management , 1995, SOSP.

[17]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[18]  Michael M. Swift,et al.  Nooks: an architecture for reliable device drivers , 2002, EW 10.

[19]  Gu-Yeon Wei,et al.  Profiling a warehouse-scale computer , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[20]  Haibo Chen,et al.  Fast In-Memory Transaction Processing Using RDMA and HTM , 2017, ACM Trans. Comput. Syst..

[21]  John K. Ousterhout,et al.  Pseudo Devices: User-Level Extensions to the Sprite File System , 1988 .

[22]  Jason Evans April A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .

[23]  Gustavo Alonso,et al.  DPI: The Data Processing Interface for Modern Networks (Extended Abstract) , 2019, BTW.

[24]  Haibo Chen,et al.  Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better! , 2018, OSDI.

[25]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.

[26]  Michio Honda,et al.  StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs , 2016, USENIX Annual Technical Conference.

[27]  Thomas E. Anderson,et al.  TAS: TCP Acceleration as an OS Service , 2019, EuroSys.

[28]  Renato Recio,et al.  An RDMA Protocol Specification , 2002 .

[29]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[30]  Hari Balakrishnan,et al.  The Case for Moving Congestion Control Out of the Datapath , 2017, HotNets.

[31]  Christoforos E. Kozyrakis,et al.  ReFlex: Remote Flash ≈ Local Flash , 2017, ASPLOS.

[32]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[33]  Gernot Heiser,et al.  User-Level Device Drivers: Achieved Performance , 2005, Journal of Computer Science and Technology.

[34]  Donald E. Porter,et al.  Rethinking the library OS from the top down , 2011, ASPLOS XVI.

[35]  Thomas E. Anderson,et al.  Strata: A Cross Media File System , 2017, SOSP.

[36]  Gustavo Alonso,et al.  Fast and strongly-consistent per-item resilience in key-value stores , 2018, EuroSys.

[37]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[38]  Rastislav Bodík,et al.  Floem: A Programming System for NIC-Accelerated Network Applications , 2018, OSDI.

[39]  Thomas E. Anderson,et al.  Ingress Pipeline Queues Packet Buffer DMA PipelineDMA Egress Pipeline , 2015 .

[40]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[41]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[42]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.