Challenges and solutions for fast remote persistent memory access

Non-volatile main memory DIMMs (NVMMs), such as Intel's Optane DC Persistent Memory modules, provide data durability with orders of magnitude higher performance than prior durable technologies. This paper explores the unique challenges that arise when building high-performance networked systems for NVMM. Compared to DRAM, we find that NVMMs have distinctive fundamental properties that pose unique challenges for networked access to NVMM, both from the NIC and the CPU. We show that much of the challenges in efficient access to remote NVMM arises from the fact that CPU caches are not optimized for NVMM. To address these challenges, we propose a menu of solutions for current hardware and evaluate their benefits.

[1]  Qing Wang,et al.  FlatStore: An Efficient Log-Structured Key-Value Storage Engine for Persistent Memory , 2020, ASPLOS.

[2]  Haibo Chen,et al.  Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better! , 2018, OSDI.

[3]  Dhabaleswar K. Panda,et al.  High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[4]  Michael Kaminsky,et al.  Datacenter RPCs can be General and Fast , 2018, NSDI.

[5]  Gaurav Agarwal,et al.  RDMA Extensions for Enhanced Memory Placement , 2020 .

[6]  Subramanya Dulloor,et al.  Let's Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems , 2015, SIGMOD Conference.

[7]  Steven Swanson,et al.  This paper is included in the Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’20) , 2022 .

[8]  Srinivasan Seshan,et al.  Hyperloop: group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems , 2018, SIGCOMM.

[9]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[10]  Jian Yang,et al.  Orion: A Distributed File System for Non-Volatile Main Memory and RDMA-Capable Networks , 2019, FAST.

[11]  Hai Jin,et al.  Warstack: Improving LLC Replacement for NVM with a Writeback-Aware Reuse Stack , 2017, 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[12]  Marco Canini,et al.  Assise: Performance and Availability via NVM Colocation in a Distributed File System , 2019, ArXiv.

[13]  Yuan Xie,et al.  WADE: Writeback-aware dynamic cache management for NVM-based main memory system , 2013, TACO.

[14]  Michael M. Swift,et al.  Mnemosyne: lightweight persistent memory , 2011, ASPLOS XVI.

[15]  Geoffrey M. Voelker,et al.  CacheCloud: Towards Speed-of-light Datacenter Communication , 2018, HotCloud.

[16]  Karsten Schwan,et al.  Data tiering in heterogeneous memory systems , 2016, EuroSys.

[17]  Haibo Chen,et al.  Fast In-Memory Transaction Processing Using RDMA and HTM , 2017, ACM Trans. Comput. Syst..

[18]  Gustavo Alonso,et al.  Consensus in a Box: Inexpensive Coordination in Hardware , 2016, NSDI.

[19]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[20]  Dahlia Malkhi,et al.  CORFU: A Shared Log Design for Flash Clusters , 2012, NSDI.

[21]  Dan Tsafrir,et al.  Storm: a fast transactional dataplane for remote data structures , 2019, SYSTOR.

[22]  Amin Vahdat,et al.  Snap: a microkernel approach to host networking , 2019, SOSP.

[23]  Nathan Beckmann,et al.  Writeback-Aware Caching (Brief Announcement) , 2019, SPAA.

[24]  Carsten Binnig,et al.  The End of a Myth: Distributed Transaction Can Scale , 2016, Proc. VLDB Endow..

[25]  Torsten Hoefler,et al.  DARE: High-Performance State Machine Replication on RDMA Networks , 2015, HPDC.

[26]  Jian Yang,et al.  Mojim: A Reliable and Highly-Available Non-Volatile Memory System , 2015, ASPLOS.

[27]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[28]  Haitao Wu,et al.  RDMA over Commodity Ethernet at Scale , 2016, SIGCOMM.

[29]  David G. Andersen,et al.  FaSST: Fast, Scalable and Simple Distributed Transactions with Two-Sided (RDMA) Datagram RPCs , 2016, OSDI.

[30]  Steven Swanson,et al.  An Empirical Guide to the Behavior and Use of Scalable Persistent Memory , 2019, FAST.

[31]  Steven Swanson,et al.  A study of application performance with non-volatile main memory , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[32]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[33]  Paul Brett,et al.  A1: A Distributed In-Memory Graph Database , 2020, SIGMOD Conference.

[34]  Haibo Chen,et al.  Fast and general distributed transactions using RDMA and HTM , 2016, EuroSys.

[35]  Scott Shenker,et al.  Revisiting network support for RDMA , 2018, SIGCOMM.

[36]  Michio Honda,et al.  PASTE: A Network Programming Interface for Non-Volatile Main Memory , 2018, NSDI.

[37]  Andrew Pavlo,et al.  Write-Behind Logging , 2016, Proc. VLDB Endow..

[38]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[39]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[40]  Mendel Rosenblum,et al.  Fast crash recovery in RAMCloud , 2011, SOSP.

[41]  Andy Rudoff,et al.  Persistent Memory: The Value to HPC and the Challenges , 2017, MCHPC@SC.