PMNet: In-Network Data Persistence

To guarantee data persistence, storage workloads (such as key-value stores and databases) typically use a synchronous protocol that places the network and server stack latency on the critical path of request processing. The use of the fast and byte-addressable persistent memory (PM) has helped mitigate the storage overhead of the server stack; yet, networking is still a dominant factor in the end-to-end latency of request processing. Emerging programmable network devices can reduce network latency by moving parts of the applications’ compute into the network (e.g., caching results for read requests); however, for update requests, the client still has to stall on the server to commit the updates, persistently.In this work, we introduce in-network data persistence that extends the data-persistence domain from servers to the network, and present PMNet, a programmable data plane (e.g., switch or NIC) with PM for persisting data in the network. PMNet logs incoming update requests and acknowledges clients directly without having them wait on the server to commit the request. In case of a failure, the logged requests act as redo logs for the server to recover. We implement PMNet on an FPGA and evaluate its performance using common PM workloads, including key-value stores and PM-backed applications. Our evaluation shows that PMNet can improve the throughput of update requests by 4.31× on average, and the 99th-percentile tail latency by 3.23×.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  Li Zhang,et al.  NVMcached: An NVM-based Key-Value Cache , 2016, APSys.

[3]  Harendra Kumar,et al.  High Performance Metadata Integrity Protection in the WAFL Copy-on-Write File System , 2017, FAST.

[4]  Jennifer Rexford,et al.  HULA: Scalable Load Balancing Using Programmable Data Planes , 2016, SOSR.

[5]  Ian Rae,et al.  F1: A Distributed SQL Database That Scales , 2013, Proc. VLDB Endow..

[6]  Jiwu Shu,et al.  Log-Structured Non-Volatile Main Memory , 2017, USENIX Annual Technical Conference.

[7]  Roy H. Campbell,et al.  Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory , 2011, FAST.

[8]  Michael M. Swift,et al.  Aerie: flexible file-system interfaces to storage-class memory , 2014, EuroSys '14.

[9]  Yiying Zhang,et al.  Distributed shared persistent memory , 2017, SoCC.

[10]  Hiroki Matsutani,et al.  LaKe: An Energy Efficient, Low Latency, Accelerated Key-Value Store , 2018, ArXiv.

[11]  Baris Kasikci,et al.  AGAMOTTO: How Persistent is your Persistent Memory Application? , 2020, OSDI.

[12]  Laxmi N. Bhuyan,et al.  An efficient packet scheduling algorithm in network processors , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[13]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[14]  Kirill Levchenko,et al.  Uncovering Bugs in P4 Programs with Assertion-based Verification , 2018, SOSR.

[15]  Margo I. Seltzer,et al.  Persistent Memcached: Bringing Legacy Code to Byte-Addressable Persistent Memory , 2017, HotStorage.

[16]  Nick McKeown,et al.  p4v: practical verification for programmable data planes , 2018, SIGCOMM.

[17]  Jian Xu,et al.  NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories , 2016, FAST.

[18]  Fan Yang,et al.  SwitchAgg: A Further Step Towards In-Network Computation , 2019, FPGA.

[19]  Eric S. Chung,et al.  A Configurable Cloud-Scale DNN Processor for Real-Time AI , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[20]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[21]  Babak Falsafi,et al.  Manycore Network Interfaces for in-memory rack-scale computing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[22]  Mohammad Alian,et al.  A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[24]  Sanjay Kumar,et al.  System software for persistent memory , 2014, EuroSys '14.

[25]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[26]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[27]  Yuan He,et al.  An Open-Source Benchmark Suite for Microservices and Their Hardware-Software Implications for Cloud & Edge Systems , 2019, ASPLOS.

[28]  Antony I. T. Rowstron,et al.  Camdoop: Exploiting In-network Aggregation for Big Data Applications , 2012, NSDI.

[29]  Ming Liu,et al.  Approximating Fair Queueing on Reconfigurable Switches , 2018, NSDI.

[30]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[31]  Bingsheng He,et al.  NV-Tree: Reducing Consistency Cost for NVM-based Single Level Systems , 2015, FAST.

[32]  Xiao Liu,et al.  Basic Performance Measurements of the Intel Optane DC Persistent Memory Module , 2019, ArXiv.

[33]  Jishen Zhao,et al.  PMTest: A Fast and Flexible Testing Framework for Persistent Memory Programs , 2019, ASPLOS.

[34]  Nick McKeown,et al.  Programmable Packet Scheduling at Line Rate , 2016, SIGCOMM.

[35]  Tao Zhang,et al.  Overcoming the challenges of crossbar resistive memory architectures , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[36]  Edouard Bugnion,et al.  ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks , 2017, SOSP.

[37]  Mahmut T. Kandemir,et al.  Evaluating STT-RAM as an energy-efficient main memory alternative , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[38]  Jacob Nelson,et al.  IncBricks: Toward In-Network Computation with an In-Network Cache , 2017, ASPLOS.

[39]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[40]  Thomas F. Wenisch,et al.  Deconstructing the Tail at Scale Effect Across Network Protocols , 2017, ArXiv.

[41]  Andrew Pavlo,et al.  How to Build a Non-Volatile Memory Database Management System , 2017, SIGMOD Conference.

[42]  David A. Patterson,et al.  Attack of the killer microseconds , 2017, Commun. ACM.

[43]  Thomas F. Wenisch,et al.  Enhancing Server Efficiency in the Face of Killer Microseconds , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[44]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[45]  Zhe Wu,et al.  CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services , 2015, NSDI.

[46]  Alexander G. Schwing,et al.  Accelerating Distributed Reinforcement learning with In-Switch Computing , 2019, 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA).

[47]  Thomas F. Wenisch,et al.  µTune: Auto-Tuned Threading for OLDI Microservices , 2018, OSDI.

[48]  Lingjia Tang,et al.  Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[49]  Xiaozhou Li,et al.  NetChain: Scale-Free Sub-RTT Coordination , 2018, NSDI.

[50]  Thomas E. Anderson,et al.  Strata: A Cross Media File System , 2017, SOSP.

[51]  Ippokratis Pandis,et al.  TPC-E vs. TPC-C: characterizing the new TPC-E benchmark via an I/O comparison study , 2011, SGMD.

[52]  Nam Sung Kim,et al.  NetDIMM: Low-Latency Near-Memory Network Interface Architecture , 2019, MICRO.

[53]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[54]  Walter Willinger,et al.  Sonata: query-driven streaming network telemetry , 2018, SIGCOMM.

[55]  Qin Jin,et al.  Persistent B+-Trees in Non-Volatile Main Memory , 2015, Proc. VLDB Endow..

[56]  Samira Manabi Khan,et al.  PMFuzz: test case generation for persistent memory programs , 2021, ASPLOS.

[57]  Jian Xu,et al.  NOVA-Fortis: A Fault-Tolerant Non-Volatile Main Memory File System , 2017, SOSP.

[58]  Philippe Cudré-Mauroux,et al.  The Case for Network Accelerated Query Processing , 2019, CIDR.

[59]  Enhong Chen,et al.  KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC , 2017, SOSP.

[60]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[61]  Christopher Frost,et al.  Better I/O through byte-addressable, persistent memory , 2009, SOSP '09.

[62]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[63]  Panos Kalnis,et al.  In-Network Computation is a Dumb Idea Whose Time Has Come , 2017, HotNets.

[64]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[65]  James R. Larus,et al.  Object-oriented recovery for non-volatile memory , 2018, Proc. ACM Program. Lang..

[66]  Nate Foster,et al.  NetCache: Balancing Key-Value Stores with Fast In-Network Caching , 2017, SOSP.

[67]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.

[68]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[69]  Per-Åke Larson,et al.  BzTree: A High-Performance Latch-free Range Index for Non-Volatile Memory , 2018, Proc. VLDB Endow..

[70]  Parthasarathy Ranganathan,et al.  The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition , 2018, The Datacenter as a Computer.

[71]  George Varghese,et al.  Every microsecond counts: tracking fine-grain latencies with a lossy difference aggregator , 2009, SIGCOMM '09.

[72]  Brian Demsky,et al.  Jaaru: efficiently model checking persistent memory programs , 2021, ASPLOS.

[73]  Jin Xiong,et al.  HiKV: A Hybrid Index Key-Value Store for DRAM-NVM Memory Systems , 2017, USENIX Annual Technical Conference.

[74]  Samira Khan,et al.  Cross-Failure Bug Detection in Persistent Memory Programs , 2020, ASPLOS.

[75]  Weimin Zheng,et al.  DudeTM: Building Durable Transactions with Decoupling for Persistent Memory , 2017, ASPLOS.

[76]  Ming Zhao,et al.  Client-side Flash Caching for Cloud Systems , 2014, SYSTOR 2014.

[77]  Jian Yang,et al.  Characterizing and Modeling Non-Volatile Memory Systems , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[78]  Jian Yang,et al.  Mojim: A Reliable and Highly-Available Non-Volatile Memory System , 2015, ASPLOS.

[79]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[80]  Dong Li,et al.  Fast, flexible, and comprehensive bug detection for persistent memory programs , 2021, ASPLOS.

[81]  Luis Ceze,et al.  Exploring storage class memory with key value stores , 2013, INFLOW '13.

[82]  Andy Rudoff,et al.  Persistent Memory Programming , 2017, login Usenix Mag..

[83]  Parag Agrawal,et al.  The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.

[84]  Hwanju Kim,et al.  Request-Oriented Durable Write Caching for Application Performance , 2015, USENIX Annual Technical Conference.

[85]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[86]  Robert Soulé,et al.  Packet Subscriptions for Programmable ASICs , 2018, HotNets.

[87]  Jialin Li,et al.  Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control , 2017, SOSP.

[88]  Minlan Yu,et al.  SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs , 2017, SIGCOMM.

[89]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[90]  Lin Wu,et al.  UDORN: A design framework of persistent in-memory key-value database for NVM , 2017, 2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA).

[91]  Jacob Nelson,et al.  Evaluating the Power of Flexible Packet Processing for Network Resource Allocation , 2017, NSDI.

[92]  Ricardo Bianchini,et al.  Managing Tail Latency in Datacenter-Scale File Systems Under Production Constraints , 2019, EuroSys.

[93]  A. L. Narasimha Reddy,et al.  SCMFS: A file system for Storage Class Memory , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[94]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.