PASTE: A Network Programming Interface for Non-Volatile Main Memory

Non-Volatile Main Memory (NVMM) devices have been integrated into general-purpose operating systems through familiar file-based interfaces, providing efficient bytegranularity access by bypassing page caches. To leverage the unique advantages of these high-performance media, the storage stack is migrating from the kernel into user-space. However, application performance remains fundamentally limited unless network stacks explicitly integrate these new storagemedia and follow themigration of storage stacks into user-space. Moreover, we argue that the storage and the network stacks must be considered together when being designed for NVMM. This requires a thoroughly new network stack design, including low-level buffer management and APIs. We propose PASTE, a new network programming interface for NVMM. It supports familiar abstractions— including busy-polling, blocking, protection, and run-tocompletion—with standard network protocols such as TCP and UDP. By operating directly on NVMM, it can be closely integrated with the persistence layer of applications. Once data is DMA’ed from a network interface card to host memory (NVMM), it never needs to be copied again—even for persistence. We demonstrate the general applicability of PASTE by implementing two popular persistent data structures: a write-ahead log and a B+ tree. We further apply PASTE to three applications: Redis, a popular persistent key-value store, pKVS, our HTTP-based key value store and the logging component of a software switch, demonstrating that PASTE not only accelerates networked storage but also enables conventional networking functions to support new features.

[1]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[2]  Hyeontaek Lim,et al.  MICA: A Holistic Approach to Fast In-Memory Key-Value Storage , 2014, NSDI.

[3]  Thomas E. Anderson,et al.  Ingress Pipeline Queues Packet Buffer DMA PipelineDMA Egress Pipeline , 2015 .

[4]  Timothy Roscoe,et al.  Arrakis , 2014, OSDI.

[5]  Luis Ceze,et al.  Operating System Implications of Fast, Cheap, Non-Volatile Memory , 2011, HotOS.

[6]  Haitao Wu,et al.  RDMA over Commodity Ethernet at Scale , 2016, SIGCOMM.

[7]  Aditya Akella,et al.  Paving the Way for NFV: Simplifying Middlebox Modifications Using StateAlyzr , 2016, NSDI.

[8]  James R. Larus,et al.  Orleans: cloud computing for everyone , 2011, SoCC.

[9]  Karsten Schwan,et al.  NVRAM-aware Logging in Transaction Systems , 2014, Proc. VLDB Endow..

[10]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[11]  Mark Handley,et al.  Network stack specialization for performance , 2013, HotNets.

[12]  Jon Howell,et al.  Flat Datacenter Storage , 2012, OSDI.

[13]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[14]  Christoforos E. Kozyrakis,et al.  IX: A Protected Dataplane Operating System for High Throughput and Low Latency , 2014, OSDI.

[15]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[16]  Navendu Jain,et al.  Demystifying the dark side of the middle: a field study of middlebox failures in datacenters , 2013, Internet Measurement Conference.

[17]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[18]  Paolo Faraboschi,et al.  Operating System Support for NVM+DRAM Hybrid Main Memory , 2009, HotOS.

[19]  Laurent Vanbever,et al.  Mille-Feuille: Putting ISP traffic under the scalpel , 2016, HotNets.

[20]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[21]  David G. Andersen,et al.  Design Guidelines for High Performance RDMA Systems , 2016, USENIX ATC.

[22]  Ada Gavrilovska,et al.  pVM: persistent virtual memory for efficient capacity scaling and object storage , 2016, EuroSys.

[23]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[24]  Rajesh K. Gupta,et al.  NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories , 2011, ASPLOS XVI.

[25]  Vyas Sekar,et al.  Design and Implementation of a Consolidated Middlebox Architecture , 2012, NSDI.

[26]  K. K. Ramakrishnan,et al.  Flurries: Countless Fine-Grained NFs for Flexible Per-Flow Customization , 2016, CoNEXT.

[27]  Subramanya Dulloor,et al.  Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems , 2015, SIGMOD Conference.

[28]  Giuseppe Lettieri,et al.  Flexible virtual machine networking using netmap passthrough , 2016, 2016 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN).

[29]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[30]  Michio Honda,et al.  mSwitch: a highly-scalable, modular software switch , 2015, SOSR.

[31]  Yiying Zhang,et al.  LITE Kernel RDMA Support for Datacenter Applications , 2017, SOSP.

[32]  Youjip Won,et al.  NVWAL: Exploiting NVRAM in Write-Ahead Logging , 2016, ASPLOS.

[33]  Yale N. Patt,et al.  Metadata update performance in file systems , 1994, OSDI '94.

[34]  Scott Shenker,et al.  Rollback-Recovery for Middleboxes , 2015, Comput. Commun. Rev..

[35]  Kaladhar Voruganti,et al.  Violet: A Storage Stack for IOPS/Capacity Bifurcated Storage Environments , 2014, USENIX Annual Technical Conference.

[36]  Andrew Warfield,et al.  Non-volatile storage , 2016, Commun. ACM.

[37]  Dawson R. Engler,et al.  Fast and flexible application-level networking on exokernel systems , 2002, TOCS.

[38]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[39]  Costin Raiciu,et al.  Rekindling network protocol innovation with user-level stacks , 2014, CCRV.

[40]  Yu Chen,et al.  Scalable Kernel TCP Design and Implementation for Short-Lived Connections , 2016, ASPLOS.

[41]  Andrew Pavlo,et al.  Write-Behind Logging , 2016, Proc. VLDB Endow..

[42]  Michio Honda,et al.  StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs , 2016, USENIX Annual Technical Conference.

[43]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[44]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[45]  Michio Honda,et al.  PASTE: Network Stacks Must Integrate with NVMM Abstractions , 2016, HotNets.

[46]  Michael B. Jones,et al.  A simple and efficient implementation of a small database , 1987, SOSP '87.

[47]  Mark Handley,et al.  Disk|Crypt|Net: rethinking the stack for high-performance video streaming , 2017, SIGCOMM.

[48]  Stratis Viglas,et al.  REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures , 2015, Proc. VLDB Endow..

[49]  Ramesh Govindan,et al.  Trumpet: Timely and Precise Triggers in Data Centers , 2016, SIGCOMM.

[50]  Nick Feamster,et al.  Intentional Network Monitoring: Finding the Needle without Capturing the Haystack , 2014, HotNets.

[51]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[52]  John D. Strunk,et al.  Chronicle: Capture and Analysis of NFS Workloads at Line Rate , 2015, FAST.

[53]  Andrew Warfield,et al.  Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage , 2017, NSDI.