Enabling ECN for Datacenter Networks With RTT Variations

ECN has been widely employed in production datacenters to deliver high throughput low latency communications. Despite being successful, prior ECN-based transports have an important drawback: they adopt a fixed RTT value in calculating instantaneous ECN marking threshold while overlooking the RTT variations in practice. In this paper, we reveal that the current practice of using a fixed high-percentile RTT for ECN threshold calculation can lead to persistent queue buildups, significantly increasing packet latency. On the other hand, directly adopting lower percentile RTTs results in throughput degradation. To handle the problem, we introduce ECN#, a simple yet effective solution to enable ECN for RTT variations. At its heart, ECN# inherits the current instantaneous ECN marking (based on a high-percentile RTT) to achieve high throughput and burst tolerance, while further marking packets (conservatively) upon detecting long-term queue buildups to eliminate unnecessary queueing delay without degrading throughput. We implement ECN# on a Barefoot Tofino switch and evaluate it through extensive testbed experiments and large-scale simulations. Our evaluation confirms that ECN# can effectively reduce latency without hurting throughput. For example, compared to the current practice, ECN# achieves up to 23.4% (31.2%) lower average (99th percentile) flow completion time (FCT) for short flows while delivering similar FCT for large flows under production workloads.

[1]  Albert G. Greenberg,et al.  Ananta: cloud scale load balancing , 2013, SIGCOMM.

[2]  Haitao Wu,et al.  RDMA over Commodity Ethernet at Scale , 2016, SIGCOMM.

[3]  Joseph Ishac Traffic Generator , 2001 .

[4]  Guido Appenzeller,et al.  Sizing router buffers , 2004, SIGCOMM '04.

[5]  Haitao Wu,et al.  Enabling ECN over Generic Packet Scheduling , 2016, CoNEXT.

[6]  Yongqiang Xiong,et al.  ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware , 2016, SIGCOMM.

[7]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[8]  Carlo Contavalli,et al.  Maglev: A Fast and Reliable Software Network Load Balancer , 2016, NSDI.

[9]  Haitao Wu,et al.  Tuning ECN for data center networks , 2012, CoNEXT '12.

[10]  Vijay Subramanian,et al.  PIE: A lightweight control scheme to address the bufferbloat problem , 2013, 2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR).

[11]  Van Jacobson,et al.  Controlling queue delay , 2012, Commun. ACM.

[12]  Ming Zhang,et al.  Duet: cloud scale load balancing with hardware and software , 2015, SIGCOMM.

[13]  George Varghese,et al.  High Speed Networks Need Proactive Congestion Control , 2015, HotNets.

[14]  Fengyuan Ren,et al.  Improving ECN marking scheme with micro-burst traffic in data center networks , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[15]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[16]  David Santo Orcero Linux Virtual Server , 2007 .

[17]  Haitao Wu,et al.  Enabling ECN in Multi-Service Multi-Queue Data Centers , 2016, NSDI.

[18]  Adel Javanmard,et al.  Analysis of DCTCP: stability, convergence, and fairness , 2011, PERV.

[19]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[20]  Mark Handley,et al.  Re-architecting datacenter networks and stacks for low latency and high performance , 2017, SIGCOMM.

[21]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[22]  Donald F. Towsley,et al.  A control theoretic analysis of RED , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[23]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[24]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[25]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[26]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[27]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[28]  Arvind Krishnamurthy,et al.  High-resolution measurement of data center microbursts , 2017, Internet Measurement Conference.

[29]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[30]  Glenn Judd,et al.  Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter , 2015, NSDI.