Luopan: Sampling-Based Load Balancing in Data Center Networks

Data center networks demand high-performance, robust, and practical data plane load balancing protocols. Despite progress, existing work falls short of meeting these requirements. We design, analyze, and evaluate Luopan, a novel sampling based load balancing protocol that overcomes these challenges. Luopan operates at flowcell granularity similar to Presto. It periodically samples a few paths for each destination switch and directs flowcells to the least congested one. By being congestion-aware, Luopan improves flow completion time (FCT), and is more robust to topological asymmetries compared to Presto. The sampling approach simplifies the protocol and makes it much more scalable for implementation in large-scale networks compared to existing congestion-aware schemes. We provide analysis to show that Luopan's periodic sampling has the same asymptotic behavior as instantaneous sampling: taking 2 random samples provides exponential improvements over 1 sample. We conduct comprehensive packet-level simulations with production workloads. The results show that Luopan consistently outperforms state-of-the-art schemes in large-scale topologies. Compared to Presto, Luopan with 2 samples improves the 99.9%ile FCT of mice flows by up to 35 percent, and average FCT of medium and elephant flows by up to 30 percent. Luopan also performs significantly better than Local Sampling with large asymmetry.

[1]  Randy H. Katz,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, SIGCOMM '12.

[2]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[3]  Jennifer Rexford,et al.  Clove: Congestion-Aware Load Balancing at the Virtual Edge , 2017, CoNEXT.

[4]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[5]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[6]  Javad Ghaderi,et al.  A simple congestion-aware algorithm for load balancing in datacenter networks , 2016, INFOCOM.

[7]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[8]  Gautam Kumar,et al.  pHost: distributed near-optimal datacenter transport over commodity network fabric , 2015, CoNEXT.

[9]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[10]  Amin Vahdat,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[11]  R. Srikant,et al.  The Power of Slightly More than One Sample in Randomized Load Balancing , 2017, Math. Oper. Res..

[12]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[13]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[14]  R. Srikant,et al.  Multi-Path TCP: A Joint Congestion Control and Routing Scheme to Exploit Path Diversity in the Internet , 2006, IEEE/ACM Transactions on Networking.

[15]  Brighten Godfrey,et al.  DRILL: Micro Load Balancing for Low-latency Data Center Networks , 2017, SIGCOMM.

[16]  Jacob Nelson,et al.  Evaluating the Power of Flexible Packet Processing for Network Resource Allocation , 2017, NSDI.

[17]  Zhiping Cai,et al.  Low Latency Datacenter Networking: A Short Survey , 2013, ArXiv.

[18]  Costin Raiciu,et al.  Increasing Datacenter Network Utilisation with GRIN , 2015, NSDI.

[19]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[20]  Devavrat Shah,et al.  Fastpass: a centralized "zero-queue" datacenter network , 2015, SIGCOMM 2015.

[21]  Haitao Wu,et al.  Per-packet load-balanced, low-latency routing for clos-based data center networks , 2013, CoNEXT.

[22]  Keqiang He,et al.  Presto: Edge-based Load Balancing for Fast Datacenter Networks , 2015, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.

[23]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM 2011.

[24]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[25]  Donald F. Towsley,et al.  Multi-Path TCP: A Joint Congestion Control and Routing Scheme to Exploit Path Diversity in the Internet , 2006, IEEE/ACM Transactions on Networking.

[26]  Rong Pan,et al.  Let It Flow: Resilient Asymmetric Load Balancing with Flowlet Switching , 2017, NSDI.

[27]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[28]  Michael J. Freedman,et al.  Scalable, optimal flow routing in datacenters via local link balancing , 2013, CoNEXT.

[29]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[30]  Xin Jin,et al.  Your Data Center Switch is Trying Too Hard , 2016, SOSR.

[31]  Jennifer Rexford,et al.  HULA: Scalable Load Balancing Using Programmable Data Planes , 2016, SOSR.

[32]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[33]  尤达亚玛卡尔·斯瑞尼瓦桑,et al.  Dynamic load balancing without packet reordering , 2012 .

[34]  Hong Xu,et al.  Expeditus: Congestion-Aware Load Balancing in Clos Data Center Networks , 2016, IEEE/ACM Transactions on Networking.