TCP Performance Optimization for Epoch-based Execution

General-purpose virtual machine fault tolerance (VMFT) implementations are based on an epoch-based execution model, in which outputs of a VM being protected are buffered and released to the external world at specific time points. Because this execution model increases the size and variation of the per-packet round-trip delay and disrupts the use of the delayed ACK mechanism, the TCP performance of a VM running under this execution model tends to suffer a noticeable drop. This paper describes the design, implementation and evaluation of a set of TCP performance optimizations that are meant to address the TCP performance problems caused by the epoch-based execution model. Measurements on a complete VMFT prototype implementation called Cuju demonstrate that the proposed optimizations are able to eliminate most of these TCP performance losses when MTU is 1500 bytes.

[1]  David F. Bacon,et al.  Volatile logging in n-fault-tolerant distributed systems , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[2]  Satish Narayanasamy,et al.  Respec: Efficient Online Multiprocessor Replay via Speculation and External Determinism , 2010, ASPLOS 2010.

[3]  Peter M. Chen,et al.  Execution replay of multiprocessor virtual machines , 2008, VEE '08.

[4]  Samuel T. King,et al.  HARDWARE AND SOFTWARE APPROACHES FOR DETERMINISTIC MULTI-PROCESSOR REPLAY OF CONCURRENT PROGRAMS , 2009 .

[5]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[6]  Jie Ma,et al.  Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration , 2010, 2010 IEEE International Conference on Cluster Computing.

[7]  Yutaka Ishikawa,et al.  Enhancing TCP throughput of highly available virtual machines via speculative communication , 2012, VEE '12.

[8]  田村 芳明,et al.  Kemari: Virtual Machine Synchronization for Fault Tolerance , 2010 .

[9]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[10]  Ramana Rao Kompella,et al.  vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Ganesh Venkitachalam,et al.  The design of a practical system for fault-tolerant virtual machines , 2010, OPSR.

[12]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[13]  Wei Dong,et al.  Improving the performance of hypervisor-based fault tolerance , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[14]  Tzi-cker Chiueh,et al.  Fast memory state synchronization for virtualization-based fault tolerance , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[15]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[16]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[17]  Yutaka Ishikawa,et al.  RDMA Based Replication of Multiprocessor Virtual Machines over High-Performance Interconnects , 2011, 2011 IEEE International Conference on Cluster Computing.

[18]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.