Data center TCP (DCTCP)

Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today's state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments that lead to high application latencies, rooted in TCP's demands on the limited buffer space available in data center switches. For example, bandwidth hungry "background" flows build up queues at the switches, and thus impact the performance of latency sensitive "foreground" traffic. To address these problems, we propose DCTCP, a TCP-like protocol for data center networks. DCTCP leverages Explicit Congestion Notification (ECN) in the network to provide multi-bit feedback to the end hosts. We evaluate DCTCP at 1 and 10Gbps speeds using commodity, shallow buffered switches. We find DCTCP delivers the same or better throughput than TCP, while using 90% less buffer space. Unlike TCP, DCTCP also provides high burst tolerance and low latency for short flows. In handling workloads derived from operational measurements, we found DCTCP enables the applications to handle 10X the current background traffic, without impacting foreground traffic. Further, a 10X increase in foreground traffic does not cause any timeouts, thus largely eliminating incast problems.

[1]  Robert Shorten,et al.  Experimental Evaluation of TCP Protocols for High-Speed Networks , 2007, IEEE/ACM Transactions on Networking.

[2]  Cheng Jin,et al.  FAST TCP: Motivation, Architecture, Algorithms, Performance , 2006, IEEE/ACM Transactions on Networking.

[3]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[4]  Donald F. Towsley,et al.  Part II: control theory for buffer sizing , 2005, CCRV.

[5]  Sally Floyd,et al.  HighSpeed TCP for Large Congestion Windows , 2003, RFC.

[6]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[7]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[8]  Qian Zhang,et al.  A Compound TCP Approach for High-Speed and Long Distance Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[9]  Ron Kohavi,et al.  Practical guide to controlled experiments on the web: listen to your customers not to the hippo , 2007, KDD '07.

[10]  K. K. Ramakrishnan,et al.  A binary feedback scheme for congestion avoidance in computer networks with a connectionless network layer , 1988, SIGCOMM '88.

[11]  Lachlan L. H. Andrew,et al.  Congestion Control using Efficient Explicit Feedback , 2009, IEEE INFOCOM 2009.

[12]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[13]  Nick McKeown,et al.  Processor Sharing Flows in the Internet , 2005, IWQoS.

[14]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[15]  Larry L. Peterson,et al.  TCP Vegas: new techniques for congestion detection and avoidance , 1994 .

[16]  Sally Floyd,et al.  Adaptive RED: An Algorithm for Increasing the Robustness of RED's Active Queue Management , 2001 .

[17]  Lakshminarayanan Subramanian,et al.  One more bit is enough , 2005, SIGCOMM '05.

[18]  Donald F. Towsley,et al.  Congestion Control for Small Buffer High Speed Networks , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[19]  Guido Appenzeller,et al.  Sizing router buffers , 2004, SIGCOMM '04.

[20]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[21]  Rong Pan,et al.  AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[22]  Thomas Voice,et al.  Stability and fairness of explicit congestion control with small buffers , 2008, CCRV.

[23]  Van Jacobson,et al.  The synchronization of periodic routing messages , 1993, SIGCOMM '93.

[24]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[25]  James R. Hamilton,et al.  On Designing and Deploying Internet-Scale Services , 2007, LISA.

[26]  Donald F. Towsley,et al.  On designing improved controllers for AQM routers supporting TCP flows , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[27]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[28]  Douglas J. Leith,et al.  Experimental evaluation of Cubic-TCP , 2008 .

[29]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.