A WAN-Optimized Live Storage Migration Mechanism toward Virtual Machine Evacuation upon Severe Disasters

Wide-area VM migration is a technology with potential to aid IT services recovery since it can be used to evacuate virtualized servers to safe locations upon a critical disaster. However, the amount of data involved in a wide-area VM migration is substantially larger compared to VM migrations within LAN due to the need to transfer virtualized storage in addition to memory and CPU states. This increase of data makes it challenging to relocate VMs under a limited time window with electrical power. In this paper, we propose a mechanism to improve live storage migration across WAN. The key idea is to reduce the amount of data to be transferred by proactively caching virtual disk blocks to a backup site during regular VM operation. As a result of pre-cached disk blocks, the proposed mechanism can dramatically reduce the amount of data and consequently the time required to live migrate the entire VM state. The mechanism was evaluated using a prototype implementation under different workloads and network conditions, and we confirmed that it dramatically reduces the time to complete a VM live migration. By using the proposed mechanism, it is possible to relocate a VM from Japan to the United States in just under 40 seconds. This relocation would otherwise take over 1500 seconds, demonstrating that the proposed mechanism was able to reduce the migration time by 97.5%. key words: disaster recovery, live migration, virtual machine, virtual machine monitor

[1]  Beng-Hong Lim,et al.  Fast Transparent Migration for Virtual Machines , 2005, USENIX Annual Technical Conference, General Track.

[2]  大村 圭,et al.  Rapid VM Synchronization with I/O Emulation Logging-Replay , 2011, ARC 2011.

[3]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[4]  Tal Garfinkel,et al.  The Design and Evolution of Live Storage Migration in VMware ESX , 2011, USENIX Annual Technical Conference.

[5]  José A. B. Fortes,et al.  Reducing the Migration Times of Multiple VMs on WANs Using a Feedback Controller , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[6]  Charles E. Perkins,et al.  Mobility support in IPv6 , 1996, MobiCom '96.

[7]  田村 芳明,et al.  Kemari: Virtual Machine Synchronization for Fault Tolerance , 2010 .

[8]  Peter T. Breuer,et al.  The Network Block Device , 2000 .

[9]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[10]  Prashant J. Shenoy,et al.  CloudNet: dynamic pooling of cloud resources by live WAN migration of virtual machines , 2011, VEE.

[11]  Satoshi Sekiguchi,et al.  A live storage migration mechanism over wan and its performance evaluation , 2009, VTDC '09.

[12]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[13]  Christine Morin,et al.  Shrinker: Improving Live Migration of Virtual Clusters over WANs with Distributed Data Deduplication and Content-Based Addressing , 2011, Euro-Par.

[14]  A. Mirkin Containers checkpointing and live migration , 2010 .

[15]  Philipp Reisner,et al.  Replicated Storage with Shared Disk Semantics , 2007 .

[16]  Yellu Sreenivasulu,et al.  FAST TRANSPARENT MIGRATION FOR VIRTUAL MACHINES , 2014 .

[17]  Yingwei Luo,et al.  Live and incremental whole-system migration of virtual machines using block-bitmap , 2008, 2008 IEEE International Conference on Cluster Computing.

[18]  Anja Feldmann,et al.  Live wide-area migration of virtual machines including local persistent state , 2007, VEE '07.

[19]  Satoshi Sekiguchi,et al.  Kagemusha: A guest-transparent Mobile IPv6 mechanism for wide-area live VM migration , 2012, 2012 IEEE Network Operations and Management Symposium.

[20]  Jie Zheng,et al.  Workload-aware live storage migration for clouds , 2011, VEE '11.

[21]  Renato J. O. Figueiredo,et al.  On the use of virtualization technologies to support uninterrupted IT services: A case study with lessons learned from the Great East Japan Earthquake , 2012, 2012 IEEE International Conference on Communications (ICC).

[22]  M. Rosenblum,et al.  Optimizing the migration of virtual computers , 2002, OSDI '02.

[23]  Andrew Warfield,et al.  SecondSite: disaster tolerance as a service , 2012, VEE '12.