论文信息 - Fast memory state synchronization for virtualization-based fault tolerance - 字舞流文

Fast memory state synchronization for virtualization-based fault tolerance

migration and thus enables a new form of fault tolerance that is completely transparent to applications and operating systems. While initial prototypes show promise, virtualization-based fault-tolerant architecture still experiences substantial performance overhead especially for data-intensive workloads. The main performance challenge of virtualizationbased fault tolerance is how to synchronize the memory states of the Master and Slave in a way that minimizes the end-to-end impact on the application performance. This paper describes three optimization techniques for memory state synchronization: fine-grained dirty region identification, speculative state transfer, and synchronization traffic reduction using active slave, and presents a comprehensive performance study of these techniques under three realistic workloads, the TPC-E benchmark, the SPECsfs 2008 CIFS benchmark, and a Microsoft Exchange workload. We show that these three techniques can each reduce the amount of end-of-epoch synchronization traffic by a factor of up to 7, 15 and 5, respectively.

Tzi-cker Chiueh | Maohua Lu | T. Chiueh | Maohua Lu

[1] Anja Feldmann,et al. Live wide-area migration of virtual machines including local persistent state , 2007, VEE '07.

[2] Yang Yu,et al. A feather-weight virtual machine for windows applications , 2006, VEE '06.

[3] 田村芳明,et al. Kemari: Virtual Machine Synchronization for Fault Tolerance , 2010 .

[4] Samuel T. King,et al. ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[5] J. Duell. The design and implementation of Berkeley Lab's linux checkpoint/restart , 2005 .

[6] C. Waldspurger. Memory resource management in VMware ESX server , 2002, OSDI '02.

[7] Andrew Warfield,et al. Live migration of virtual machines , 2005, NSDI.

[8] Fred B. Schneider,et al. Hypervisor-based fault tolerance , 1996, TOCS.

[9] Tzi-cker Chiueh,et al. Duplex: a reusable fault tolerance extension framework for network access devices , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[10] Barton P. Miller,et al. Process migration in DEMOS/MP , 1983, SOSP '83.

[11] Jason Duell,et al. The Lam/Mpi Checkpoint/Restart Framework: System-Initiated Checkpointing , 2005, Int. J. High Perform. Comput. Appl..

[12] Jason Duell,et al. The design and implementation of Berkeley Lab's linuxcheckpoint/restart , 2005 .

[13] Dutch T. Meyer,et al. Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[14] Min Xu,et al. A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.

[15] Rilson O. do Nascimento,et al. DBT-5: A Fair Usage Open-Source TPC-E Implementation for Performance Evaluation of Computer Systems , 2007 .

[16] Jack J. Dongarra,et al. FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World , 2000, PVM/MPI.

[17] Christian Engelmann,et al. Proactive fault tolerance for HPC with Xen virtualization , 2007, ICS '07.

[18] Peter M. Chen,et al. Execution replay for intrusion analysis , 2006 .

[19] Samuel T. King,et al. Debugging Operating Systems with Time-Traveling Virtual Machines (Awarded General Track Best Paper Award!) , 2005, USENIX Annual Technical Conference, General Track.