Efficient Shared Memory Orchestration towards Demand Driven Memory Slicing

Memory is increasingly becoming a bottleneck for big data and latency-sensitive applications in virtualized systems. Memory efficiency is critical for high-performance execution of virtual machines (VMs). Mechanisms proposed for improving memory utilization often rely on an accurate estimation of VM working set size at runtime, which is difficult under changing workloads. This paper explores opportunities for improving memory efficiency and their impacts on the performance of VM executions. First, we show that if each VM is initialized with an application-specified lower bound memory, then by maintaining a shared memory region across VMs in the presence of temporal memory usage variations on the host, those VMs under high memory pressure can minimize their performance loss by opportunistically and transparently harvesting idle memory on other VMs. Second, we show that by enabling on-demand VM memory allocation and deallocation in the presence of changing workloads, VM performance degradation due to memory swapping can be reduced effectively, compared to the conventional VM configuration scenario, in which all VMs are allocated with the upper-bound of memory requested by their applications. Third, we show that by providing shared memory pipes between co-located VMs, the inter-VM communication can speed up by avoiding unnecessary overhead of communication via the network. We develop MemLego, a lightweight shared memory based system, to achieve all these benefits without requiring any modification to user applications and the OSes. We demonstrate the effectiveness of these opportunities through extensive experiments on unmodified Redis and MemCached. Using MemLego, the throughput of Redis and Memcached improves by up to 4x over the native system without MemLego, up to 2 orders of magnitude when the applications working set size does not fit in memory.

[1]  Qi Zhang,et al.  Workload Adaptive Shared Memory Management for High Performance Network I/O in Virtualized Cloud , 2016, IEEE Transactions on Computers.

[2]  Onur Mutlu,et al.  Improving DRAM performance by parallelizing refreshes with accesses , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[3]  Scott Shenker,et al.  Network support for resource disaggregation in next-generation datacenters , 2013, HotNets.

[4]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[5]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[6]  Qingyuan Deng,et al.  MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.

[7]  Jaehyuk Huh,et al.  Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources , 2012, HotCloud.

[8]  Chris Fallin,et al.  Memory power management via dynamic voltage/frequency scaling , 2011, ICAC '11.

[9]  Onur Mutlu,et al.  Base-Delta-Immediate Compression: A Practical Data Compression Mechanism for On-Chip Caches , 2012 .

[10]  Onur Mutlu,et al.  Adaptive-latency DRAM: Optimizing DRAM timing for the common-case , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[11]  Andrea C. Arpaci-Dusseau,et al.  Geiger: monitoring the buffer cache in a virtual machine environment , 2006, ASPLOS XII.

[12]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[13]  Muli Ben-Yehuda,et al.  Applications Know Best: Performance-Driven Memory Overcommit with Ginkgo , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[14]  Onur Mutlu,et al.  Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[15]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.

[16]  Lorenzo Alvisi,et al.  Improving the performance of software distributed shared memory with speculation , 2005, IEEE Transactions on Parallel and Distributed Systems.

[17]  Dan Tsafrir,et al.  VSwapper: a memory swapper for virtualized environments , 2014, ASPLOS.

[18]  Alan L. Cox,et al.  TreadMarks: shared memory computing on networks of workstations , 1996 .

[19]  Gustavo Alonso,et al.  Application level ballooning for efficient server consolidation , 2013, EuroSys '13.

[20]  Archana Ganapathi,et al.  Analysis and Lessons from a Publicly Available Google Cluster Trace , 2010 .

[21]  Qi Zhang,et al.  iBalloon: Efficient VM Memory Balancing as a Service , 2016, 2016 IEEE International Conference on Web Services (ICWS).

[22]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[23]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[24]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[25]  Kun Wang,et al.  Optimizing virtual machine scheduling in NUMA multicore systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[26]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[27]  Onur Mutlu,et al.  Exploiting compressed block size as an indicator of future reuse , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[28]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[29]  Qi Zhang,et al.  MemFlex: A Shared Memory Swapper for High Performance VM Execution , 2017, IEEE Transactions on Computers.

[30]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[31]  Onur Mutlu,et al.  Linearly compressed pages: A main memory compression framework with low complexity and low latency , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[32]  Peter A. Dinda,et al.  SymCall: symbiotic virtualization through VMM-to-guest upcalls , 2011, VEE '11.

[33]  Chris Mason,et al.  Transcendent Memory and Linux , 2006 .

[34]  Ching-Chi Lin,et al.  Energy-Aware Virtual Machine Dynamic Provision and Scheduling for Cloud Computing , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[35]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[36]  Qi Zhang Dynamic Shared Memory Architecture, Systems, and Optimizations for High Performance and Secure Virtualized Cloud , 2017 .

[37]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[38]  Onur Mutlu,et al.  A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[39]  Zhenlin Wang,et al.  Dynamic memory balancing for virtual machines , 2009, OPSR.