Cross-layer flow and congestion control for datacenter networks

A key feature of the upcoming datacenter networks is their losslessness, achieved by the means of Priority Flow Control (PFC). Inherited from the cluster and HPC networks that traditionally use link level flow control to prevent packet loss across multiple virtual lanes, channels and/or priorities, this feature is now also becoming widely available in the next generation 10, 40 and 100Gbps Ethernet switches and adapters. Nevertheless, excepting storage protocols such as Fibre Channel over Ethernet, PFC is new and unfamiliar to the majority of datacenter applications and protocols. That is, despite PFC's key role in the datacenter and its increasing availability -- supported by virtually all future Converged Enhanced Ethernet (CEE) products -- its impact on the higher layer routing and transport protocols has yet to be investigated. Hence our motivation to assess the performance exposure of three widespread TCP versions to PFC, as well as to the potentially conflicting Quantized Congestion Notification (QCN) congestion management mechanism, which apparently replicates on Layer 2 some more advanced TCP functionality. As workloads of interest we have selected a few revealing commercial and scientific applications. For quantitative performance evaluation we use two distinct methodologies: (a) Our reference is an accurate Layer 2 CEE 10Gbps network simulator intercoupled with TCP implementations extracted from FreeBSD v9; (b) A hardware setup scaled down in speed and size. The main outcome of our work is that PFC can notably improve the TCP performance across all tested configurations and workloads. This result was validated in both environments. Hence our recommendation to enable PFC whenever this is possible. By contrast, QCN can either harm or help depending on its parameter settings, and essentially, on the co-existence of competing UDP or other non-congestion-managed traffic.

[1]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[2]  Rong Pan,et al.  AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[3]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[4]  Dan Tsafrir,et al.  Effects of Clock Resolution on the Scheduling of Real-Time and Interactive Processes , 2001 .

[5]  Dan Tsafrir,et al.  Effects of clock resolution on the scheduling of interactive and soft real-time processes , 2003, SIGMETRICS '03.

[6]  German Rodríguez Herrera Understanding and reducing contention in generalized fat tree networks for high performance computing , 2011 .

[7]  Cyriel Minkenberg,et al.  Trace-driven co-simulation of high-performance computing systems using OMNeT++ , 2009, SimuTools.

[8]  Robert Birke,et al.  Delay-Based Cloud Congestion Control , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[9]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[10]  David A. Maltz,et al.  DCTCP: Efficient Packet Transport for the Commoditized Data Center , 2010 .

[11]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[12]  Andreea Anghel,et al.  Short and Fat: TCP Performance in CEE Datacenter Networks , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[13]  Qian Zhang,et al.  A Compound TCP Approach for High-Speed and Long Distance Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[14]  Nick McKeown,et al.  Why flow-completion time is the right metric for congestion control , 2006, CCRV.

[15]  Larry Peterson,et al.  TCP Vegas: new techniques for congestion detection and avoidance , 1994, SIGCOMM 1994.

[16]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[17]  A. L. Narasimha Reddy,et al.  Performance of Quantized Congestion Notification in TCP Incast Scenarios of Data Centers , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[18]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[19]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[20]  Aditya Akella,et al.  Understanding data center traffic characteristics , 2009, WREN 2009.

[21]  Vern Paxson,et al.  Computing TCP's Retransmission Timer , 2000, RFC.

[22]  Mohan Kumar,et al.  On generalized fat trees , 1995, Proceedings of 9th International Parallel Processing Symposium.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.