Hardware-Supported Remote Persistence for Distributed Persistent Memory

The advent of Persistent Memory (PM) necessitates an evolution of Remote Direct Memory Access (RDMA) technologies for supporting remote data persistence. Previous software-based solutions require remote CPU intervention and postpone the visibility of remote persistence. In this paper, we design several hardware-supported RDMA primitives to flush data from the volatile cache of RDMA Network Interface Cards (RNICs) to the PM. We also propose durable RPCs based on the proposed RDMA Flush primitives to support remote data persistence and fast failure recovery. We emulate the performance of RDMA Flush primitives through other RDMA primitives, and compare our proposals with several state-of-the-art RPCs in a real testbed equipped with PM and InfiniBand networks. Experimental results show that our proposals can improve the throughput of RPCs by up to 90%, and reduce the 99th percentile latency by up to 49%. The experimental studies also provide instructive guidelines for designing RDMA-based distributed PM systems.

[1]  Gabriel Antoniu,et al.  Tailwind: Fast and Atomic RDMA-based Replication , 2018, USENIX ATC.

[2]  Shin-Yeh Tsai,et al.  Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores , 2020, USENIX ATC.

[3]  Andreas Kipf,et al.  Low-Latency Communication for Fast DBMS Using RDMA and Shared Memory , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[4]  Steven Swanson,et al.  This paper is included in the Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’20) , 2022 .

[5]  Ryan E. Grant,et al.  MPI tag matching performance on ConnectX and ARM , 2019, EuroMPI.

[6]  Anuj Kalia,et al.  Challenges and solutions for fast remote persistent memory access , 2020, SoCC.

[7]  Yu Hua,et al.  Write-Optimized and Consistent RDMA-based NVM Systems , 2019, ArXiv.

[8]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[9]  Yiying Zhang,et al.  LITE Kernel RDMA Support for Datacenter Applications , 2017, SOSP.

[10]  Jian Yang,et al.  Mojim: A Reliable and Highly-Available Non-Volatile Memory System , 2015, ASPLOS.

[11]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[12]  Paal E. Engelstad,et al.  IncludeOS: A Minimal, Resource Efficient Unikernel for Cloud Services , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[13]  Ankit Singla,et al.  Enabling Efficient RDMA-based Synchronous Mirroring of Persistent Memory Transactions , 2018, ArXiv.

[14]  Linpeng Huang,et al.  Exploiting RDMA for Distributed Low-Latency Key/Value Store on Non-volatile Main Memory , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).

[15]  Kang Chen,et al.  RFP: When RPC is Faster than Server-Bypass with RDMA , 2017, EuroSys.

[16]  Michael Kaminsky,et al.  Using RDMA efficiently for key-value services , 2014, SIGCOMM.

[17]  Yi Wang,et al.  Error Recovery of RDMA Packets in Data Center Networks , 2019, 2019 28th International Conference on Computer Communication and Networks (ICCCN).

[18]  Virendra J. Marathe,et al.  Correct, Fast Remote Persistence , 2019, ArXiv.

[19]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[20]  Torsten Hoefler,et al.  Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches , 2017, 2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI).

[21]  Yuan Xie,et al.  Persistence Parallelism Optimization: A Holistic Approach from Memory Bus to RDMA Network , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[22]  Hari Subramoni,et al.  Design and Characterization of InfiniBand Hardware Tag Matching in MPI , 2020, 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID).

[23]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[24]  Dhabaleswar K. Panda,et al.  High Performance Design for HDFS with Byte-Addressability of NVM and RDMA , 2016, ICS.

[25]  Animesh Trivedi,et al.  DaRPC: Data Center RPC , 2014, SoCC.

[26]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.

[27]  Jian Yang,et al.  Orion: A Distributed File System for Non-Volatile Main Memory and RDMA-Capable Networks , 2019, FAST.

[28]  Sayantan Sur,et al.  Memcached Design on High Performance RDMA Capable Interconnects , 2011, 2011 International Conference on Parallel Processing.

[29]  Lin Wu,et al.  UDORN: A design framework of persistent in-memory key-value database for NVM , 2017, 2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA).

[30]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[31]  Yiying Zhang,et al.  Distributed shared persistent memory , 2017, SoCC.

[32]  Neha Pawar,et al.  Managing application level elasticity and availability , 2014, 10th International Conference on Network and Service Management (CNSM) and Workshop.

[33]  Ivy B. Peng,et al.  System evaluation of the Intel optane byte-addressable NVM , 2019, MEMSYS.

[34]  Srinivasan Seshan,et al.  Hyperloop: group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems , 2018, SIGCOMM.

[35]  Youyou Lu,et al.  Scalable RDMA RPC on Reliable Connection with Efficient Resource Sharing , 2019, EuroSys.

[36]  Tom Talpey,et al.  RDMA Durable Write Commit , 2016 .