TIMELY: RTT-based Congestion Control for the Datacenter

Datacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accuracy, and that these RTTs are sufficient to estimate switch queueing. Then we describe how TIMELY can adjust transmission rates using RTT gradients to keep packet latency low while delivering high bandwidth. We implement our design in host software running over NICs with OS-bypass capabilities. We show using experiments with up to hundreds of machines on a Clos network topology that it provides excellent performance: turning on TIMELY for OS-bypass messaging over a fabric with PFC lowers 99 percentile tail latency by 9X while maintaining near line-rate throughput. Our system also outperforms DCTCP running in an optimized kernel, reducing tail latency by $13$X. To the best of our knowledge, TIMELY is the first delay-based congestion control protocol for use in the datacenter, and it achieves its results despite having an order of magnitude fewer RTT signals (due to NIC offload) than earlier delay-based schemes such as Vegas.

[1]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[2]  Rong Pan,et al.  AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[3]  Thomas Voice,et al.  Stability and fairness of explicit congestion control with small buffers , 2008, CCRV.

[4]  Donald F. Towsley,et al.  A control theoretic analysis of RED , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[5]  Mark Handley,et al.  Internet Congestion Control for Future High Bandwidth-Delay Product Environments , 2013 .

[6]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[7]  Nick McKeown,et al.  Deconstructing datacenter packet transport , 2012, HotNets-XI.

[8]  Nick McKeown,et al.  Processor Sharing Flows in the Internet , 2005, IWQoS.

[9]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[10]  Abdul Kabbani,et al.  FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks , 2014, CoNEXT.

[11]  Bogdan M. Wilamowski,et al.  The Transmission Control Protocol , 2005, The Industrial Information Technology Handbook.

[12]  Rong Pan,et al.  Data center transport mechanisms: Congestion control theory and IEEE standardization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[13]  Cheng Jin,et al.  FAST TCP: Motivation, Architecture, Algorithms, Performance , 2006, IEEE/ACM Transactions on Networking.

[14]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[15]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[16]  Sally Floyd,et al.  TCP and explicit congestion notification , 1994, CCRV.

[17]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[18]  Larry L. Peterson,et al.  TCP Vegas: new techniques for congestion detection and avoidance , 1994 .

[19]  Van Jacobson,et al.  Controlling queue delay , 2012, Commun. ACM.

[20]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[21]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[22]  Ankit Singla,et al.  Practical DCB for improved data center networks , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[23]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[24]  Donald F. Towsley,et al.  On designing improved controllers for AQM routers supporting TCP flows , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[25]  Qian Zhang,et al.  A Compound TCP Approach for High-Speed and Long Distance Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[26]  Mark Handley,et al.  Congestion control for high bandwidth-delay product networks , 2002, SIGCOMM.

[27]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[28]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[29]  Changhyun Lee,et al.  Accurate Latency-based Congestion Feedback for Datacenters , 2015, USENIX Annual Technical Conference.

[30]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[31]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[32]  Grenville J. Armitage,et al.  Revisiting TCP Congestion Control Using Delay Gradients , 2011, Networking.

[33]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[34]  David A. Hayes,et al.  Delay-based Congestion Control for Low Latency , 2013 .

[35]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[36]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[37]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[38]  Amin Vahdat,et al.  SENIC: Scalable NIC for End-Host Rate Limiting , 2014, NSDI.