VMS: Load Balancing Based on the Virtual Switch Layer in Datacenter Networks

There have been many load balancing solutions for datacenter networks. Almost all of them require modifications to the network fabric or/and virtual machines. Recently, the virtual switch layer becomes an ideal location for datacenter operators to deal with the load balancing problem. In this paper, we propose Virtual Multi-channel Scatter (VMS), a packet-level load balancing design in the virtual switch layer. VMS scatters packets in one TCP flow to several different forwarding paths (channels). VMS has several noteworthy properties. First, VMS is low cost and transparent to tenants. It can be deployed when the datacenter operators do not attempt to change the network fabric or cannot control the transport protocol inside VMs. Second, by employing window-based channel selection, VMS is adaptive to network congestion and topology asymmetry. Third, VMS works well with Generic Segmentation Offload/Generic Receive Offload (GRO/GSO) mechanism in the Linux kernel, unlike other packet-level load balancing schemes. Finally, VMS can also be offloaded to SmartNIC to reduce CPU overhead further. Our evaluations show that VMS achieves comparable performance to the ideal packet-level scheme in normal cases and well handles topology asymmetries, while only modifies the virtual switch layer. In the symmetric topology, VMS achieves up to 47% and 22% better flow completion time (FCT) than Equal Cost MultiPath (ECMP) and the best-of-breed flowlet-level CONGA. When there is topology asymmetry, VMS outperforms the ideal packet-level scheme and CONGA by up to $3.0\times $ and $1.4\times $ respectively. Further, the overhead of VMS is tolerable.

[1]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[2]  Willy Zwaenepoel,et al.  Optimizing TCP Receive Performance , 2008, USENIX ATC.

[3]  Srikanth Kandula,et al.  Dynamic load balancing without packet reordering , 2007, CCRV.

[4]  Mark Handley,et al.  Design, Implementation and Evaluation of Congestion Control for Multipath TCP , 2011, NSDI.

[5]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[6]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[7]  Devavrat Shah,et al.  Load balancing with memory , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[8]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[9]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[10]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[11]  Abdul Kabbani,et al.  FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks , 2014, CoNEXT.

[12]  Mark Handley,et al.  How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP , 2012, NSDI.

[13]  Min Zhu,et al.  WCMP: weighted cost multipathing for improved fairness in data centers , 2014, EuroSys '14.

[14]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[15]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[16]  Hong Zhang,et al.  Resilient Datacenter Load Balancing in the Wild , 2017, SIGCOMM.

[17]  Jennifer Rexford,et al.  Clove: Congestion-Aware Load Balancing at the Virtual Edge , 2017, CoNEXT.

[18]  Chuang Lin,et al.  Sharing Bandwidth by Allocating Switch Buffer in Data Center Networks , 2014, IEEE Journal on Selected Areas in Communications.

[19]  T. V. Lakshman,et al.  UNO: uniflying host and smart NIC offload for flexible packet processing , 2017, SoCC.

[20]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[21]  Brighten Godfrey,et al.  DRILL: Micro Load Balancing for Low-latency Data Center Networks , 2017, SIGCOMM.

[22]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[23]  Keqiang He,et al.  AC/DC TCP: Virtual Congestion Control Enforcement for Datacenter Networks , 2016, SIGCOMM.

[24]  Vimalkumar Jeyakumar,et al.  Juggler: a practical reordering resilient network stack for datacenters , 2016, EuroSys.

[25]  Haitao Wu,et al.  Enabling ECN in Multi-Service Multi-Queue Data Centers , 2016, NSDI.

[26]  Nick McKeown,et al.  Virtualized Congestion Control , 2016, SIGCOMM.

[27]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[28]  Jun Bi,et al.  VMS: Traffic balancing based on virtual switches in datacenter networks , 2017, 2017 IEEE 25th International Conference on Network Protocols (ICNP).

[29]  Jennifer Rexford,et al.  HULA: Scalable Load Balancing Using Programmable Data Planes , 2016, SOSR.

[30]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[31]  Marco Chiesa,et al.  Traffic engineering with Equal-Cost-Multipath: An algorithmic perspective , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[32]  Haitao Wu,et al.  Per-packet load-balanced, low-latency routing for clos-based data center networks , 2013, CoNEXT.

[33]  Martín Casado,et al.  The Design and Implementation of Open vSwitch , 2015, NSDI.

[34]  Marco Chiesa,et al.  Traffic engineering with Equal-Cost-Multipath: An algorithmic perspective , 2014, INFOCOM.

[35]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[36]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[37]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[38]  Keqiang He,et al.  Presto: Edge-based Load Balancing for Fast Datacenter Networks , 2015, SIGCOMM.

[39]  Rong Pan,et al.  Let It Flow: Resilient Asymmetric Load Balancing with Flowlet Switching , 2017, NSDI.

[40]  Wenjun Lv,et al.  QDAPS: Queueing Delay Aware Packet Spraying for Load Balancing in Data Center , 2018, 2018 IEEE 26th International Conference on Network Protocols (ICNP).