Remote regions: a simple abstraction for remote memory

We propose an intuitive abstraction for a process to export its memory to remote hosts, and to access the memory exported by others. This abstraction provides a simpler interface to RDMA and other remote memory technologies compared to the existing verbs interface. The key idea is that a process can export parts of its memory as files, called remote regions, that can be accessed through the usual file system operations (read, write, memory map, etc). We built this abstraction in the Linux kernel, and evaluated it. We show that remote regions are easy to use and perform close to RDMA. We demonstrate it via micro-benchmarks and by adapting two in-memory single-host applications to use remote memory: R and Metis. With R, using remote regions requires no changes to the code and allows R to run with remote memory that exceeds the physical memory of a host. With Metis, the modifications amount to ≈100 lines of code and they allow Metis to scale its performance across 8 hosts.

[1]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[2]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[3]  Willy Zwaenepoel,et al.  Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[4]  Nikolaos Hardavellas,et al.  Cashmere-VLM: Remote memory paging for software distributed shared memory , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[5]  Yiying Zhang,et al.  Distributed shared persistent memory , 2017, SoCC.

[6]  Alan L. Cox,et al.  Run-time Support for Distributed Object Sharing in Safe Programming Languages , 2003 .

[7]  Alfons Kemper,et al.  High-Speed Query Processing over High-Speed Networks , 2015, Proc. VLDB Endow..

[8]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[9]  Nicholas Carriero,et al.  The S/Net's Linda kernel (extended abstract) , 1985, SOSP '85.

[10]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[11]  Gustavo Alonso,et al.  Rack-Scale In-Memory Join Processing using RDMA , 2015, SIGMOD Conference.

[12]  Ryan Stutsman,et al.  Crail : A High-Performance I / O Architecture for Distributed Data Processing , .

[13]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[14]  Dhabaleswar K. Panda,et al.  Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device , 2005, 2005 IEEE International Conference on Cluster Computing.

[15]  Robert Morris,et al.  Optimizing MapReduce for Multicore Architectures , 2010 .

[16]  Kang G. Shin,et al.  Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[17]  Ulrich Drepper,et al.  How To Write Shared Libraries , 2005 .

[18]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[19]  Tom Talpey,et al.  Network File System (NFS) Remote Direct Memory Access (RDMA) Problem Statement , 2009, RFC.

[20]  Krste Asanovic,et al.  FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .

[21]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[22]  Jim Waldo,et al.  A Note on Distributed Computing , 1996, Mobile Object Systems.

[23]  Henri E. Bal,et al.  Performance evaluation of the Orca shared-object system , 1998, TOCS.

[24]  Dejan S. Milojicic,et al.  Beyond Processor-centric Operating Systems , 2015, HotOS.

[25]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1986, PODC '86.

[26]  Mahadev Satyanarayanan,et al.  Long Term Distributed File Reference Tracing: Implementation and Experience , 1996, Softw. Pract. Exp..

[27]  Evangelos P. Markatos,et al.  The Network RamDisk: Using remote memory on heterogeneous NOWs , 1999, Cluster Computing.

[28]  David Mazières,et al.  A Toolkit for User-Level File Systems , 2001, USENIX Annual Technical Conference, General Track.

[29]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[30]  Carsten Binnig,et al.  The End of a Myth: Distributed Transaction Can Scale , 2016, Proc. VLDB Endow..

[31]  Marcos K. Aguilera,et al.  Remote memory in the age of fast networks , 2017, SoCC.

[32]  Srinivasan Parthasarathy,et al.  Cashmere-2L: software coherent shared memory on a clustered remote-write network , 1997, SOSP.

[33]  Andrew Warfield,et al.  Decibel: Isolation and Sharing in Disaggregated Rack-Scale Storage , 2017, NSDI.

[34]  Gordon Bell,et al.  Revisiting Scalable Coherent Shared Memory , 2018, Computer.

[35]  Yiying Zhang,et al.  LITE Kernel RDMA Support for Datacenter Applications , 2017, SOSP.

[36]  Tao Li,et al.  Octopus: an RDMA-enabled Distributed Persistent Memory File System , 2017, USENIX ATC.

[37]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[38]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[39]  Sanjeev Setia,et al.  Dodo: a user-level system for exploiting idle memory in workstation clusters , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[40]  Jim Griffioen,et al.  A New Design for Distributed Systems: The Remote Memory Model , 1990, USENIX Summer.

[41]  Jacob Nelson,et al.  Latency-Tolerant Software Distributed Shared Memory , 2015, USENIX Annual Technical Conference.

[42]  Dhabaleswar K. Panda,et al.  High-Performance Design of Hadoop RPC with RDMA over InfiniBand , 2013, 2013 42nd International Conference on Parallel Processing.

[43]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[44]  Michael M. Swift Towards O(1) Memory , 2017, HotOS.

[45]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[46]  Wan Choi,et al.  Design of Cache Backend Using Remote Memory for Network File System , 2017, 2017 International Conference on High Performance Computing & Simulation (HPCS).