VMware distributed resource Management : design , Implementation , and lessons learned

Automated management of physical resources is critical for reducing the operational costs of virtualized environments. An effective resource-management solution must provide performance isolation among virtual machines (VMs), handle resource fragmentation across physical hosts and optimize scheduling for multiple resources. It must also utilize the underlying hardware infrastructure efficiently. In this paper, we present the design and implementation of two such management solutions: DRS and DPM. We also highlight some key lessons learned from production customer deployments over a period of more than five years. VMware's Distributed Resource Scheduler (DRS) manages the allocation of physical resources to a set of virtual machines deployed in a cluster of hosts, each running the VMware ESX hypervisor. DRS maps VMs to hosts and performs intelligent load balancing in order to improve performance and to enforce both user-specified policies and system-level constraints. Using a variety of experiments, augmented with simulation results, we show that DRS significantly improves the overall performance of VMs running in a cluster. DRS also supports a " what-if " mode, making it possible to evaluate the impact of changes in workloads or cluster configuration. VMware's Distributed Power Management (DPM) extends DRS with the ability to reduce power consumption by consolidating VMs onto fewer hosts. DPM recommends evacuating and powering off hosts when CPU and memory resources are lightly utilized. It recommends powering on hosts appropriately as demand increases, or as required to satisfy resource-management policies and constraints. Our extensive evaluation shows that in clusters with non-trivial periods of lowered demand, DPM reduces server power consumption significantly.

[1]  Edward G. Coffman,et al.  A Tight Asymptotic Bound for Next-Fit-Decreasing Bin-Packing , 1981 .

[2]  Wenceslas Fernandez de la Vega,et al.  Bin packing can be solved within 1+epsilon in linear time , 1981, Comb..

[3]  Richard M. Karp,et al.  An efficient approximation scheme for the one-dimensional bin-packing problem , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[4]  Richard M. Karp,et al.  A probabilistic analysis of multidimensional bin packing problems , 1984, STOC '84.

[5]  David S. Johnson,et al.  A 71/60 theorem for bin packing , 1985, J. Complex..

[6]  D. T. Lee,et al.  A simple on-line bin-packing algorithm , 1985, JACM.

[7]  J. B. G. Frenk,et al.  On the multidimensional vector bin packing , 1990, Acta Cybern..

[8]  Lixia Zhang VirtualClock: a new traffic control algorithm for packet-switched networks , 1991, TOCS.

[9]  M. Yue A simple proof of the inequality FFD (L) ≤ 11/9 OPT (L) + 1, ∀L for the FFD bin-packing algorithm , 1991 .

[10]  William E. Weihl,et al.  Lottery scheduling: flexible proportional-share resource management , 1994, OSDI '94.

[11]  QueueingJon,et al.  WF 2 Q : Worst-case Fair Weighted Fair , 1996 .

[12]  Edward G. Coffman,et al.  Approximation algorithms for bin packing: a survey , 1996 .

[13]  Gerhard J. Woeginger,et al.  There is no Asymptotic PTAS for Two-Dimensional Vector Packing , 1997, Inf. Process. Lett..

[14]  Susanne Albers,et al.  Average-case analyses of first fit and random fit bin packing , 2000, SODA '98.

[15]  Ronitt Rubinfeld,et al.  Fast Approximate PCPs for Multidimensional Bin-Packing Problems , 1999, RANDOM-APPROX.

[16]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[17]  Nikhil Bansal,et al.  New approximability and inapproximability results for 2-dimensional Bin Packing , 2004, SODA '04.

[18]  Gautam Kar,et al.  Application Performance Management in Virtualized Server Environments , 2006, 2006 IEEE/IFIP Network Operations and Management Symposium NOMS 2006.

[19]  Alberto Caprara,et al.  Improved approximation algorithms for multidimensional bin packing problems , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[20]  Arun Venkataramani,et al.  Black-box and Gray-box Strategies for Virtual Machine Migration , 2007, NSDI.

[21]  György Dósa,et al.  The Tight Bound of First Fit Decreasing Bin-Packing Algorithm Is FFD(I) <= 11/9OPT(I) + 6/9 , 2007, ESCAPE.

[22]  Ulrich Drepper,et al.  The Cost of Virtualization , 2008, ACM Queue.

[23]  Jing Xu,et al.  Autonomic resource management in virtualized data centers using fuzzy logic-based approaches , 2008, Cluster Computing.

[24]  Vanish Talwar,et al.  No "power" struggles: coordinated multi-level power management for the data center , 2008, ASPLOS.

[25]  Analysis and Simulation of a Fair Queuing Algorithm , 2008 .

[26]  Jerome A. Rolia,et al.  An integrated approach to resource pool management: Policies, efficiency and quality metrics , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[27]  Werner Vogels,et al.  Beyond Server Consolidation , 2008, ACM Queue.

[28]  Kang G. Shin,et al.  Automated control of multiple virtualized resources , 2009, EuroSys '09.

[29]  Irfan Ahmad,et al.  PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.

[30]  Peter J. Varman,et al.  mClock: Handling Throughput Variability for Hypervisor IO Scheduling , 2010, OSDI.

[31]  Irfan Ahmad,et al.  BASIL: Automated IO Load Balancing Across Storage Devices , 2010, FAST.

[32]  VMware vCenter Server Performance and Best Practices , 2010 .

[33]  Evangelos Kotsovinos,et al.  Virtualization: Blessing or Curse? , 2010, Commun. ACM.

[34]  virtualization.info 日本語 白書:VMware Network I/O Control, Architecture, Performance and Best Practices(20100930-1) , 2010 .

[35]  Herodotos Herodotou,et al.  No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics , 2011, SoCC.

[36]  Irfan Ahmad,et al.  Pesto: online storage performance management in virtualized datacenters , 2011, SoCC.

[37]  Xiaohui Gu,et al.  CloudScale: elastic resource scaling for multi-tenant cloud systems , 2011, SoCC.

[38]  Anne M. Holler,et al.  Cloud Scale Resource Management: Challenges and Techniques , 2011, HotCloud.

[39]  Boon Thau Loo,et al.  Declarative automated cloud resource orchestration , 2011, SoCC.

[40]  Yellu Sreenivasulu,et al.  FAST TRANSPARENT MIGRATION FOR VIRTUAL MACHINES , 2014 .