Memory-Oriented Distributed Computing at Rack Scale

Introduction: Recent advances provide the building blocks for constructing rack-scale architectures with a large pool of disaggregated non-volatile memory (NVM) that can be shared across a high-performance system interconnect by decentralized compute resources. NVDIMMs and new NVM technologies provide byte-addressable persistent storage accessible through loads and stores, rather than the block I/O path used today. High-performance system interconnects, such as Gen-Z, OmniPath, and RDMA over InfiniBand, provide low-latency access from compute nodes to fabricattached memory (e.g., microsecond-scale remote memory latencies are already possible with RDMA). Disaggregated memory architectures1 share several characteristics: 1) a high-capacity pool of memory that can be shared by heterogeneous computing resources at low latency; 2) a partially disaggregated architecture that treats node-local memory as private and disaggregated memory as shared; 3) a heterogeneous memory system containing both volatile DRAM and NVM; 4) unmediated access from a compute node to disaggregated memory provided by one-sided loads/stores or gets/puts and facilitated through atomic operations (e.g., compare-and-swap as in RDMA or Gen-Z); 5) hardwareenforced cache coherence domains limited to a single compute node; and 6) a separation of fault domains between processing and disaggregated memory. MODC: Our goal is to investigate how to program this emerging class of system architectures. We propose memoryoriented distributed computing (MODC), an approach for building system runtimes that exploits disaggregated memory to facilitate work distribution, coordination and fault tolerance. Global state is maintained as shared data structures in disaggregated memory that are visible to all participating processes, rather than being physically partitioned. Because processes on all nodes have direct access to global data structures, data can be efficiently shared, without the need for message overheads. Processes are equally able to analyze and service requests for any part of the dataset, which provides better load balancing and more robust performance for skewed workloads. Shared access to global data also