A software-defined architecture and prototype for disaggregated memory rack scale systems

Disaggregation and rack-scale systems have the potential of drastically increasing TCO and utilization of cloud datacenters, while maintaining performance. In this paper, we present a novel rack-scale system architecture featuring software-defined remote memory disaggregation. Our hardware design and operating system extensions enable unmodified applications to dynamically attach to memory segments residing on physically remote memory pools and use such remote segments in a byte-addressable manner, as if they were local to the application. Our system features also a control plane that automates software-defined dynamic matching of compute to memory resources, as driven by datacenter workload needs. We prototyped our system on the commercially available Zynq Ultrascale+ MPSoC platform. To our knowledge, this is the first time a software-defined disaggregated system has been prototyped on commercial hardware and evaluated through industry standard software benchmarks. Our initial results — using benchmarks that are artificially highly adversarial in terms of memory bandwidth — show that disaggregated memory access exhibits a round-trip latency of only 134 clock cycles; and a throughput penalty of as low as 55%, relative to locally-attached memory. We also discuss estimations as to how our findings may translate to applications with pragmatically milder memory aggressiveness levels, as well as innovation avenues across the stack opened up by our work.

[1]  Thomas E. Anderson,et al.  High-speed switch scheduling for local-area networks , 1993, TOCS.

[2]  G. Zervas,et al.  Disaggregated compute, memory and network systems: A new era for optical data centre architectures , 2017, 2017 Optical Fiber Communications Conference and Exhibition (OFC).

[3]  Babak Falsafi,et al.  Scale-out NUMA , 2014, ASPLOS.

[4]  Scott Shenker,et al.  Network support for resource disaggregation in next-generation datacenters , 2013, HotNets.

[5]  Y. Tamir,et al.  High-performance multi-queue buffers for VLSI communications switches , 1988, ISCA '88.

[6]  John L. Hennessy,et al.  The performance and scalability of distributed shared memory cache coherence protocols , 1998 .

[7]  Holger Fröning,et al.  A new degree of freedom for memory allocation in clusters , 2010, Cluster Computing.

[8]  Rao Pramod Subba,et al.  Is memory disaggregation feasible? A case study with Spark SQL , 2016, Symposium on Architectures for Networking and Communications Systems.

[9]  Fabio Checconi,et al.  A Throughput-Optimized Optical Network for Data-Intensive Computing , 2014, IEEE Micro.

[10]  Kostas Katrinis,et al.  Rack-scale disaggregated cloud data centers: The dReDBox project vision , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[12]  Nick McKeown,et al.  Matching output queueing with a combined input/output-queued switch , 1999, IEEE J. Sel. Areas Commun..

[13]  Qi Hao,et al.  A Survey on Software-Defined Network and OpenFlow: From Concept to Implementation , 2014, IEEE Communications Surveys & Tutorials.

[14]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.