Efficient Orchestration of Host and Remote Shared Memory for Memory Intensive Workloads

Since very few contributions to the development of an unified memory orchestration framework for efficient management of both host and remote idle memory have been made, we present Valet, an efficient approach to orchestration of host and remote shared memory for improving performance of memory intensive workloads. The paper makes three original contributions. First, we redesign the data flow in the critical path by introducing a host-coordinated memory pool that works as a local cache to reduce the latency in the critical path of the host and remote memory orchestration. Second, Valet utilizes unused local memory across containers by managing local memory via Valet host-coordinated memory pool, which allows containers to dynamically expand and shrink their memory allocations according to the workload demands. Third, Valet provides an efficient remote memory reclaiming technique on remote peers, based on two optimizations: (1) an activity-based victim selection scheme to allow the least-active-chunk of data to be selected for serving the eviction requests and (2) a migration protocol to move the least-active-chunk of data to less-memory-pressured remote node. As a result, Valet can effectively reduce the performance impact and migration overhead on local nodes. Our extensive experiments on both NoSQL systems and Machine Learning (ML) workloads show that Valet outperforms existing representative remote paging systems with up to 226X throughput improvement and up to 98% latency decrease over conventional OS swap facility for big data and ML workloads, and by up to 5.5X throughput improvement and up to 78.4% latency decrease over the state-of-the-art remote paging systems. Valet is open sourced at this https URL.

[1]  Qi Zhang,et al.  Shared Memory Optimization in Virtualized Cloud , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[2]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[3]  Kang G. Shin,et al.  Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[4]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[5]  Marcos K. Aguilera,et al.  Remote memory in the age of fast networks , 2017, SoCC.

[6]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[7]  Willy Zwaenepoel,et al.  Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[8]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[9]  Jian Yang,et al.  Mojim: A Reliable and Highly-Available Non-Volatile Memory System , 2015, ASPLOS.

[10]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[11]  Jim Waldo,et al.  A Note on Distributed Computing , 1996, Mobile Object Systems.

[12]  Yingwei Luo,et al.  A Transparent Remote Paging Model for Virtual Machines , 2008 .

[13]  Haitao Wu,et al.  RDMA over Commodity Ethernet at Scale , 2016, SIGCOMM.

[14]  Dhabaleswar K. Panda,et al.  Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device , 2005, 2005 IEEE International Conference on Cluster Computing.

[15]  Andrew A. Chien,et al.  The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments , 2016, FAST.

[16]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[17]  Yiying Zhang,et al.  LITE Kernel RDMA Support for Datacenter Applications , 2017, SOSP.

[18]  Minlan Yu,et al.  HPCC: high precision congestion control , 2019, SIGCOMM.

[19]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[20]  Jim Griffioen,et al.  A New Design for Distributed Systems: The Remote Memory Model , 1990, USENIX Summer.

[21]  Torsten Hoefler,et al.  Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack , 2006, Euro-Par.

[22]  Miguel Castro,et al.  No compromises: distributed transactions with consistency, availability, and performance , 2015, SOSP.

[23]  Sanjeev Setia,et al.  Dodo: a user-level system for exploiting idle memory in workstation clusters , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[24]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[25]  Jeanna Neefe Matthews,et al.  An Exploration of Network RAM , 1998 .

[26]  Krste Asanovic,et al.  FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .

[27]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[28]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[29]  Paul Hudak,et al.  Memory coherence in shared virtual memory systems , 1989, TOCS.

[30]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[31]  Huaimin Wang,et al.  HybridSwap: A scalable and synthetic framework for guest swapping on virtualization platform , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[32]  Kourosh Gharachorloo,et al.  Shasta: a low overhead, software-only approach for supporting fine-grain shared memory , 1996, ASPLOS VII.

[33]  Song Jiang,et al.  Workload analysis of a large-scale key-value store , 2012, SIGMETRICS '12.

[34]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[35]  Keith D. Underwood,et al.  Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[36]  Mel Gorman,et al.  Understanding the Linux Virtual Memory Manager , 2004 .

[37]  Yanzhao Wu,et al.  Memory Disaggregation: Research Problems and Opportunities , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[38]  Christopher Olston,et al.  SpongeFiles: mitigating data skew in mapreduce using distributed memory , 2014, SIGMOD Conference.

[39]  Munenori Kai,et al.  Design and evaluation of page-swap protocols for a remote memory paging system , 2017, 2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM).

[40]  Thomas F. Wenisch,et al.  System-level implications of disaggregated memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[41]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[42]  Kuang-Ching Wang,et al.  The Design and Operation of CloudLab , 2019, USENIX ATC.

[43]  Scott Shenker,et al.  Network support for resource disaggregation in next-generation datacenters , 2013, HotNets.

[44]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[45]  Jian Yang,et al.  Orion: A Distributed File System for Non-Volatile Main Memory and RDMA-Capable Networks , 2019, FAST.

[46]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[47]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[48]  George Porter,et al.  Is memory disaggregation feasible? A case study with Spark SQL , 2016, 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[49]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[50]  Nikolaos Hardavellas,et al.  Cashmere-VLM: Remote memory paging for software distributed shared memory , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[51]  Wenqi Cao,et al.  Hierarchical Orchestration of Disaggregated Memory , 2020, IEEE Transactions on Computers.

[52]  Jinyang Li,et al.  Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store , 2013, USENIX ATC.

[53]  Umesh Deshpande,et al.  MemX: Virtualization of Cluster-Wide Memory , 2010, 2010 39th International Conference on Parallel Processing.

[54]  Kuzman Ganchev,et al.  Nswap: A Network Swapping Module for Linux Clusters , 2003, Euro-Par.

[55]  Evangelos P. Markatos,et al.  The Network RamDisk: Using remote memory on heterogeneous NOWs , 1999, Cluster Computing.

[56]  Evangelos P. Markatos,et al.  Implementation of a Reliable Remote Memory Pager , 1996, USENIX ATC.

[57]  Renato Recio,et al.  A Remote Direct Memory Access Protocol Specification , 2007, RFC.

[58]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.