Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi

VMware ESXi leverages hardware support for MMU virtualization available in modern Intel/AMD CPUs. To optimize address translation performance when running on such CPUs, ESXi preferably uses host large pages (2MB in x86-64 systems) to back VM's guest memory. While using host large pages provides best performance when host has sufficient free memory, it increases host memory pressure and effectively defeats page sharing. Hence, the host is more likely to hit the point where ESXi has to reclaim VM memory through much more expensive techniques such as ballooning or host swapping. As a result, using host large pages may significantly hurt consolidation ratio. To deal with this problem, we propose a new host large page management policy that allows to: a) identify 'cold' large pages and break them even when host has plenty of free memory; b) break all large pages proactively when host free memory becomes scarce, but before the host starts ballooning or swapping; c) reclaim the small pages within the broken large pages through page sharing. With the new policy, the shareable small pages can be shared much earlier and the amount of memory that needs to be ballooned or swapped can be largely reduced when host memory pressure is high. We also propose an algorithm to dynamically adjust the page sharing rate when proactively breaking large pages using a VM large page shareability estimator for higher efficiency. Experimental results show that the proposed large page management policy can improve the performance of various workloads up to 2.1x by significantly reducing the amount of ballooned or swapped memory when host memory pressure is high. Applications still fully benefit from host large pages when memory pressure is low.

[1]  Ishan Banerjee,et al.  Memory Overcommitment in the ESX Server , 2014 .

[2]  Narayanan Ganapathy,et al.  General Purpose Operating System Support for Multiple Page Sizes , 1998, USENIX Annual Technical Conference.

[3]  Pangfeng Liu,et al.  An Empirical Study on Memory Sharing of Virtual Machines for Server Consolidation , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[4]  Indira Subramanian,et al.  Implementation of Multiple Pagesize Support in HP-UX , 1998, USENIX Annual Technical Conference.

[5]  Alan L. Cox,et al.  Practical, transparent operating system support for superpages , 2002, OPSR.

[6]  Steven Hand,et al.  Satori: Enlightened Page Sharing , 2009, USENIX Annual Technical Conference.

[7]  Zhen Fang,et al.  Reevaluating online superpage promotion with hardware support , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[8]  Performance Evaluation of Intel EPT Hardware Assist , 2006 .

[9]  Prashant J. Shenoy,et al.  An Empirical Study of Memory Sharing in Virtual Machines , 2012, USENIX Annual Technical Conference.

[10]  George Varghese,et al.  Difference engine , 2010, OSDI.

[11]  Mendel Rosenblum,et al.  Cellular disco: resource management using virtual clusters on shared-memory multiprocessors , 2000, TOCS.

[12]  Vivien Quéma,et al.  Large Pages May Be Harmful on NUMA Systems , 2014, USENIX Annual Technical Conference.

[13]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[14]  Prateek Sharma,et al.  Singleton: system-wide page deduplication in virtual environments , 2012, HPDC '12.

[15]  Frank Bellosa,et al.  XLH: More Effective Memory Deduplication Scanners Through Cross-layer Hints , 2013, USENIX Annual Technical Conference.

[16]  Song Jiang,et al.  CLOCK-Pro: An Effective Improvement of the CLOCK Replacement , 2005, USENIX Annual Technical Conference, General Track.