A Fast Rejuvenation Technique for Server Consolidation with Virtual Machines

As server consolidation using virtual machines (VMs) is carried out, software aging of virtual machine monitors (VMMs) is becoming critical. Performance degradation or crash failure of a VMM affects all VMs on it. To counteract such software aging, a proactive technique called software rejuvenation has been proposed. A typical example of rejuvenation is to reboot a VMM. However, simply rebooting a VMM is undesirable because that needs rebooting operating systems on all VMs. In this paper, we propose a new technique for fast rejuvenation of VMMs called the warm-VM reboot. The warm-VM reboot enables efficiently rebooting only a VMM by suspending and resuming VMs without accessing the memory images. To achieve this, we have developed two mechanisms: on-memory suspend/resume of VMs and quick reload of VMMs. The warm- VM reboot reduces the downtime and prevents the performance degradation due to cache misses after the reboot.

[1]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[2]  Stuart I. Feldman,et al.  IGOR: a system for program debugging via reversible execution , 1988, PADD '88.

[3]  Kishor S. Trivedi,et al.  Proactive management of software aging , 2001, IBM J. Res. Dev..

[4]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[5]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[6]  Kishor S. Trivedi,et al.  A methodology for detection and estimation of software aging , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[7]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[8]  Mary Baker,et al.  The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment , 1992, USENIX Summer.

[9]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[10]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[11]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[12]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[13]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[14]  Kishor S. Trivedi,et al.  Analysis and implementation of software rejuvenation in cluster systems , 2001, SIGMETRICS '01.