Architectural Breakdown of End-to-End Latency in a TCP/IP Network

Adoption of the 10GbE Ethernet standard as a high performance interconnect has been impeded by two important performance-oriented considerations: (1) processing requirements of common protocol stacks and (2) end-to-end latency. The overheads of typical software based protocol stacks on CPU utilization and throughput have been well evaluated in several recent studies. We focus on end-to-end latency and present a detailed characterization across typical server system hardware and software stack components. We demonstrate that application level end-to-end one-way latency with a 10GbE connection can be as low as 10 μs for a single isolated request in a standard Linux network stack. The paper analyzes the components of the latency and discusses possible significant variations to the components under realistic conditions. We found that methods that optimize for throughput can significantly compromise Ethernet based latencies. Methods to pursue reducing the minimum latency and controlling the variations are presented.

[1]  O Gar Floating Point Implementations on Fixed Point DSP Architectures , 2005 .

[2]  Greg J. Regnier,et al.  TCP onloading for data center servers , 2004, Computer.

[3]  Mei Yang,et al.  Optimized parallel implementation of polynomial approximation math functions on a DSP processor , 2001, Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems. MWSCAS 2001 (Cat. No.01CH37257).

[4]  Arnaud Tisserand,et al.  Computing machine-efficient polynomial approximations , 2006, TOMS.

[5]  Guenole C. M. Silvestre,et al.  Simulation Tools for Fixed Point DSP Algorithms and Architectures , 2007 .

[6]  Ram Huggahalli,et al.  Direct cache access for high bandwidth network I/O , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[7]  J. McClellan,et al.  Chebyshev Approximation for Nonrecursive Digital Filters with Linear Phase , 1972 .

[8]  Jean-Michel Muller,et al.  Elementary Functions: Algorithms and Implementation , 1997 .

[9]  M. Payne,et al.  Radian reduction for trigonometric functions , 1983, SGNM.

[10]  Sylvie Boldo,et al.  Theorems on efficient argument reductions , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[11]  Ping Tak Peter Tang,et al.  An Overview of Floating-Point Support and Math Library on the Intel XScale Architecture , 2003 .

[12]  William Kahan,et al.  Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic , 1996 .

[13]  N. Anane,et al.  Multifunction generator using Horner scheme and small tables , 2003, Proceedings of the 12th IEEE International Conference on Fuzzy Systems (Cat. No.03CH37442).

[14]  Ping Tak Peter Tang,et al.  Table-lookup algorithms for elementary functions and their error analysis , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[15]  Ping Tak Peter Tang Table-driven implementation of the Expm1 function in IEEE floating-point arithmetic , 1992, TOMS.

[16]  Ram Huggahalli,et al.  Direct Cache Access for High Bandwidth Network I/O , 2005, ISCA 2005.

[17]  Greg J. Regnier,et al.  TCP performance re-visited , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[18]  Dhabaleswar K. Panda,et al.  Performance characterization of a 10-Gigabit Ethernet TOE , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[19]  Jeffrey C. Mogul,et al.  TCP Offload Is a Dumb Idea Whose Time Has Come , 2003, HotOS.

[20]  Ren-Cang Li,et al.  Near optimality of Chebyshev interpolation for elementary function computations , 2004, IEEE Transactions on Computers.

[21]  William M. Waite,et al.  Software manual for the elementary functions , 1980 .

[22]  Donald Newell,et al.  An in-depth analysis of the impact of processor affinity on network performance , 2004, Proceedings. 2004 12th IEEE International Conference on Networks (ICON 2004) (IEEE Cat. No.04EX955).

[23]  David Defour,et al.  A new range-reduction algorithm , 2005, IEEE Transactions on Computers.