Devirtualizable virtual machines enabling general, single-node, online maintenance

Maintenance is the dominant source of downtime at high availability sites. Unfortunately, the dominant mechanism for reducing this downtime, cluster rolling upgrade, has two shortcomings that have prevented its broad acceptance. First, cluster-style maintenance over many nodes is typically performed a few nodes at a time, mak-ing maintenance slow and often impractical. Second, cluster-style maintenance does not work on single-node systems, despite the fact that their unavailability during maintenance can be painful for organizations. In this paper, we propose a novel technique for online maintenance that uses virtual machines to provide maintenance on single nodes, allowing parallel maintenance over multiple nodes, and online maintenance for standalone servers. We present the Microvisor, our prototype virtual machine system that is custom tailored to the needs of online maintenance. Unlike general purpose virtual machine environments that induce continual 10-20% over-head, the Microvisor virtualizes the hardware only during periods of active maintenance, letting the guest OS run at full speed most of the time. Unlike past attempts at virtual machine optimization, we do not compromise OS transparency. We instead give up generality and tailor our virtual machine system to the minimum needs of online maintenance, eschewing features, such as I/O and memory virtualization, that it does not strictly require. The result is a very thin virtual machine system that induces only 5.6% CPU overhead when virtualizing the hardware, and zero CPU overhead when devirtualized. Using the Microvisor, we demonstrate an online OS upgrade on a live, single-node web server, reducing downtime from one hour to less than one minute.

[1]  Robert P. Goldberg,et al.  Survey of virtual machine research , 1974, Computer.

[2]  Gerald J. Popek,et al.  Verifiable secure operating system software , 1974, AFIPS '74.

[3]  David D. Keefe Hierarchical Control Programs for Systems Evaluation , 1968, IBM Syst. J..

[4]  Ophir Frieder,et al.  On-the-fly program modification: systems for dynamic updating , 1993, IEEE Software.

[5]  Noah Treuhaft,et al.  Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies , 2002 .

[6]  Jason Nieh,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation , 2022 .

[7]  Mendel Rosenblum,et al.  Cellular disco: resource management using virtual clusters on shared-memory multiprocessors , 2000, TOCS.

[8]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[9]  David A. Patterson,et al.  A Simple Way to Estimate the Cost of Downtime , 2002, LISA.

[10]  Alessandro Forin,et al.  UNIX as an Application Program , 1990, USENIX Summer.

[11]  Marianne Shaw,et al.  Scale and performance in the Denali isolation kernel , 2002, OSDI '02.

[12]  Deron Liang,et al.  NT-SwiFT: software implemented fault tolerance on Windows NT , 2004, J. Syst. Softw..

[13]  Monica S. Lam,et al.  Optimizing the migration of virtual computers , 2002, OPSR.

[14]  Cynthia E. Irvine,et al.  Analysis of the Intel Pentium's Ability to Support a Secure Virtual Machine Monitor , 2000, USENIX Security Symposium.

[15]  Robert Wahbe,et al.  Efficient software-based fault isolation , 1994, SOSP '93.

[16]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[17]  Dilma Da Silva,et al.  Enabling autonomic behavior in systems software with hot swapping , 2003, IBM Syst. J..

[18]  David Brumley,et al.  Virtual Appliances for Deploying and Maintaining Software , 2003, LISA.

[19]  Scott Nettles,et al.  Dynamic software updating , 2001, PLDI '01.

[20]  Steve J. Chapin,et al.  Distributed and multiprocessor scheduling , 1996, CSUR.

[21]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[22]  Beng-Hong Lim,et al.  Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor , 2001, USENIX Annual Technical Conference, General Track.

[23]  James P. Hennessy,et al.  Multiple Operating Systems on One Processor Complex , 1989, IBM Syst. J..

[24]  Jochen Liedtke,et al.  The performance of μ-kernel-based systems , 1997, SOSP.

[25]  Yuanyuan Zhou,et al.  Fast cluster failover using virtual memory-mapped communication , 1999, ICS '99.