Fast and space-efficient virtual machine checkpointing

Checkpointing, i.e., recording the volatile state of a virtual machine (VM) running as a guest in a virtual machine monitor (VMM) for later restoration, includes storing the memory available to the VM. Typically, a full image of the VM's memory along with processor and device states are recorded. With guest memory sizes of up to several gigabytes, the size of the checkpoint images becomes more and more of a concern. In this work we present a technique for fast and space-efficient checkpointing of virtual machines. In contrast to existing methods, our technique eliminates redundant data and stores only a subset of the VM's memory pages. Our technique transparently tracks I/O operations of the guest to external storage and maintains a list of memory pages whose contents are duplicated on non-volatile storage. At a checkpoint, these pages are excluded from the checkpoint image. We have implemented the proposed technique for paravirtualized as well as fully-virtualized guests in the Xen VMM. Our experiments with a paravirtualized guest (Linux) and two fullyvirtualized guests (Linux, Windows) show a significant reduction in the size of the checkpoint image as well as the time required to complete the checkpoint. Compared to the current Xen implementation, we achieve, on average, an 81% reduction in the stored data and a 74% reduction in the time required to take a checkpoint for the paravirtualized Linux guest. In a fully-virtualized environment runningWindows and Linux guests, we achieve a 64% reduction of the image size along with a 62% reduction in checkpointing time.

[1]  Daniel Pierre Bovet,et al.  Understanding the Linux Kernel , 2000 .

[2]  Kartik Gopalan,et al.  Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning , 2009, VEE '09.

[3]  Scott Devine,et al.  Disco: running commodity operating systems on scalable multiprocessors , 1997, TOCS.

[4]  George Varghese,et al.  Difference engine , 2010, OSDI.

[5]  Arun Venkataramani,et al.  Black-box and Gray-box Strategies for Virtual Machine Migration , 2007, NSDI.

[6]  田村 芳明,et al.  Kemari: Virtual Machine Synchronization for Fault Tolerance , 2010 .

[7]  Chris Mason,et al.  Transcendent Memory and Linux , 2006 .

[8]  Dutch T. Meyer,et al.  Parallax: virtual disks for virtual machines , 2008, Eurosys '08.

[9]  Jose Renato Santos,et al.  Bridging the Gap between Software and Hardware Techniques for I/O Virtualization , 2008, USENIX Annual Technical Conference.

[10]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[11]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[12]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[13]  Kai Li,et al.  Memory Exclusion: Optimizing the Performance of Checkpointing Systems , 1999, Softw. Pract. Exp..

[14]  Samuel T. King,et al.  Debugging Operating Systems with Time-Traveling Virtual Machines (Awarded General Track Best Paper Award!) , 2005, USENIX Annual Technical Conference, General Track.

[15]  Michael Vrable,et al.  Scalability, fidelity, and containment in the potemkin virtual honeyfarm , 2005, SOSP '05.

[16]  Andy Oram,et al.  Understanding the Linux Kernel, Second Edition , 2002 .

[17]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[18]  Dhabaleswar K. Panda,et al.  High Performance VMM-Bypass I/O in Virtual Machines , 2006, USENIX Annual Technical Conference, General Track.

[19]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[20]  Andrea C. Arpaci-Dusseau,et al.  Geiger: monitoring the buffer cache in a virtual machine environment , 2006, ASPLOS XII.

[21]  Irfan Habib,et al.  Virtualization with KVM , 2008 .

[22]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[23]  Eyal de Lara,et al.  SnowFlock: rapid virtual machine cloning for cloud computing , 2009, EuroSys '09.

[24]  Jeffrey Katcher,et al.  PostMark: A New File System Benchmark , 1997 .

[25]  Kai Shen,et al.  Virtual Machine Memory Access Tracing with Hypervisor Exclusive Cache , 2007, USENIX Annual Technical Conference.

[26]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[27]  Steven Hand,et al.  Satori: Enlightened Page Sharing , 2009, USENIX Annual Technical Conference.

[28]  Yingwei Luo,et al.  Dynamic memory balancing for virtual machines , 2009, ACM SIGOPS Oper. Syst. Rev..

[29]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.