ALB: Adaptive Load Balancing Based on Accurate Congestion Feedback for Asymmetric Topologies

In datacenter networks, multipath exists to facilitate parallel data transmission. Taking deployment challenges into account, some optimized alternatives (e.g. CLOVE, Hermes) to ECMP balance load at the virtual edge or hosts. However inaccuracies of congestion detection and reaction exist in these solutions. They either detect congestion through ECN and coarse-grained RTT measurements or are congestion-oblivious. These congestion feedbacks are not sufficient enough to indicate the accurate congestion status under asymmetry. And when rerouting events occur on multiple paths, ACKs with congestion feedback of other paths can improperly influence the current sending rate. Therefore, we explore how to balance load by solving above inaccuracy problems while ensuring good adaptation to commodity switches and existing network protocols. We propose ALB, an adaptive load-balancing mechanism based on accurate congestion feedback running at end hosts, which is resilient to asymmetry. ALB leverage a latency-based congestion detection to precisely route flowlets to lighter load paths, and an ACK correction method to avoid inaccurate flow rate adjustment. In large-scale simulations ALB achieves up to 7% and 40% better flow completion time (FCT) than CONGA and CLOVE-ECN under asymmetry.

[1]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM 2011.

[2]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[3]  Keqiang He,et al.  Presto: Edge-based Load Balancing for Fast Datacenter Networks , 2015, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.

[4]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[5]  Michael J. Freedman,et al.  Scalable, optimal flow routing in datacenters via local link balancing , 2013, CoNEXT.

[6]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[7]  Jennifer Rexford,et al.  HULA: Scalable Load Balancing Using Programmable Data Planes , 2016, SOSR.

[8]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[9]  Jennifer Rexford,et al.  Clove: Congestion-Aware Load Balancing at the Virtual Edge , 2017, CoNEXT.

[10]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[11]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[12]  Katerina J. Argyraki,et al.  How to Measure the Killer Microsecond , 2017, CCRV.

[13]  尤达亚玛卡尔·斯瑞尼瓦桑,et al.  Dynamic load balancing without packet reordering , 2012 .

[14]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[15]  Rong Pan,et al.  Let It Flow: Resilient Asymmetric Load Balancing with Flowlet Switching , 2017, NSDI.

[16]  Changhyun Lee,et al.  DX: Latency-Based Congestion Control for Datacenters , 2017, IEEE/ACM Transactions on Networking.

[17]  Hong Zhang,et al.  Resilient Datacenter Load Balancing in the Wild , 2017, SIGCOMM.

[18]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[19]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[20]  Abdul Kabbani,et al.  FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks , 2014, CoNEXT.

[21]  Brighten Godfrey,et al.  DRILL: Micro Load Balancing for Low-latency Data Center Networks , 2017, SIGCOMM.

[22]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.