论文信息 - TCP Performance Optimization for Epoch-based Execution

TCP Performance Optimization for Epoch-based Execution

General-purpose virtual machine fault tolerance (VMFT) implementations are based on an epoch-based execution model, in which outputs of a VM being protected are buffered and released to the external world at specific time points. Because this execution model increases the size and variation of the per-packet round-trip delay and disrupts the use of the delayed ACK mechanism, the TCP performance of a VM running under this execution model tends to suffer a noticeable drop. This paper describes the design, implementation and evaluation of a set of TCP performance optimizations that are meant to address the TCP performance problems caused by the epoch-based execution model. Measurements on a complete VMFT prototype implementation called Cuju demonstrate that the proposed optimizations are able to eliminate most of these TCP performance losses when MTU is 1500 bytes.

Tzi-cker Chiueh | Yifeng Sun

[1] David F. Bacon,et al. Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[2] Satish Narayanasamy,et al. Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism , 2010, ASPLOS 2010.

[3] Peter M. Chen,et al. Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[4] Samuel T. King,et al. HARDWARE AND SOFTWARE APPROACHES FOR DETERMINISTIC MULTI-PROCESSOR REPLAY OF CONCURRENT PROGRAMS , 2009 .

[5] Andrew Warfield,et al. Live migration of virtual machines , 2005, NSDI.

[6] Jie Ma,et al. Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration , 2010, 2010 IEEE International Conference on Cluster Computing.

[7] Yutaka Ishikawa,et al. Enhancing TCP throughput of highly available virtual machines via speculative communication , 2012, VEE '12.

[8] 田村芳明,et al. Kemari: Virtual Machine Synchronization for Fault Tolerance , 2010 .

[9] Fred B. Schneider,et al. Hypervisor-based fault tolerance , 1996, TOCS.

[10] Ramana Rao Kompella,et al. vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[11] Ganesh Venkitachalam,et al. The design of a practical system for fault-tolerant virtual machines , 2010, OPSR.

[12] Kashi Venkatesh Vishwanath,et al. Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[13] Wei Dong,et al. Improving the performance of hypervisor-based fault tolerance , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[14] Tzi-cker Chiueh,et al. Fast memory state synchronization for virtualization-based fault tolerance , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[15] A. Kivity,et al. kvm : the Linux Virtual Machine Monitor , 2007 .

[16] Injong Rhee,et al. CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[17] Yutaka Ishikawa,et al. RDMA Based Replication of Multiprocessor Virtual Machines over High-Performance Interconnects , 2011, 2011 IEEE International Conference on Cluster Computing.

[18] Dutch T. Meyer,et al. Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.