Luopan: Sampling based load balancing in data center networks

Data center networks demand high-performance, robust, and practical data plane load balancing protocols. Despite progress, existing work falls short of satisfying these requirements. We design and evaluate Luopan, a novel sampling based load balancing protocol that overcomes these challenges. Luopan operates at flowcell granularity similar to Presto. It periodically samples a few paths to each destination switch and directs flowcells to the least congested one. By being congestion-aware, Luopan improves flow completion time (FCT), and is more robust to topological asymmetries compared to Presto. The sampling approach simplifies the protocol and makes it much more scalable for implementation in large-scale networks compared to existing congestion-aware schemes. We conduct comprehensive packet-level simulations with a production workload. The results show that Luopan consistently outperforms state-of-the-art schemes in large-scale symmetric and asymmetric topologies. Compared to Presto, Luopan with 2 samples improves the 99%ile FCT of mice flows by up to 45%, and average FCT of medium flows by ~20%.

[1]  R. Srikant,et al.  Multi-Path TCP: A Joint Congestion Control and Routing Scheme to Exploit Path Diversity in the Internet , 2006, IEEE/ACM Transactions on Networking.

[2]  Brighten Godfrey,et al.  DRILL: Micro Load Balancing for Low-latency Data Center Networks , 2017, SIGCOMM.

[3]  Rong Pan,et al.  Let It Flow: Resilient Asymmetric Load Balancing with Flowlet Switching , 2017, NSDI.

[4]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[5]  Jacob Nelson,et al.  Evaluating the Power of Flexible Packet Processing for Network Resource Allocation , 2017, NSDI.

[6]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[7]  Jennifer Rexford,et al.  Clove: Congestion-Aware Load Balancing at the Virtual Edge , 2017, CoNEXT.

[8]  Zhiping Cai,et al.  Low Latency Datacenter Networking: A Short Survey , 2013, ArXiv.

[9]  Haitao Wu,et al.  Per-packet load-balanced, low-latency routing for clos-based data center networks , 2013, CoNEXT.

[10]  Hong Xu,et al.  Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks , 2016, IEEE/ACM Transactions on Networking.

[11]  Costin Raiciu,et al.  Increasing Datacenter Network Utilisation with GRIN , 2015, NSDI.

[12]  Gautam Kumar,et al.  pHost: distributed near-optimal datacenter transport over commodity network fabric , 2015, CoNEXT.

[13]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[14]  Xin Jin,et al.  Your Data Center Switch is Trying Too Hard , 2016, SOSR.

[15]  Amin Vahdat,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[16]  R. Srikant,et al.  The power of slightly more than one sample in randomized load balancing , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[17]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[18]  Michael J. Freedman,et al.  Scalable, optimal flow routing in datacenters via local link balancing , 2013, CoNEXT.

[19]  Keqiang He,et al.  Presto: Edge-based Load Balancing for Fast Datacenter Networks , 2015, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.

[20]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[21]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM 2011.

[22]  Devavrat Shah,et al.  Fastpass: a centralized "zero-queue" datacenter network , 2015, SIGCOMM 2015.

[23]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[24]  尤达亚玛卡尔·斯瑞尼瓦桑,et al.  Dynamic load balancing without packet reordering , 2012 .

[25]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[26]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[27]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[28]  Randy H. Katz,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, SIGCOMM '12.

[29]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[30]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[31]  Brighten Godfrey,et al.  Micro Load Balancing in Data Centers with DRILL , 2015, HotNets.

[32]  Jennifer Rexford,et al.  HULA: Scalable Load Balancing Using Programmable Data Planes , 2016, SOSR.

[33]  Javad Ghaderi,et al.  A simple congestion-aware algorithm for load balancing in datacenter networks , 2016, INFOCOM.

[34]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[35]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.