WCMP: weighted cost multipathing for improved fairness in data centers

Data Center topologies employ multiple paths among servers to deliver scalable, cost-effective network capacity. The simplest and the most widely deployed approach for load balancing among these paths, Equal Cost Multipath (ECMP), hashes flows among the shortest paths toward a destination. ECMP leverages uniform hashing of balanced flow sizes to achieve fairness and good load balancing in data centers. However, we show that ECMP further assumes a balanced, regular, and fault-free topology, which are invalid assumptions in practice that can lead to substantial performance degradation and, worse, variation in flow bandwidths even for same size flows. We present a set of simple algorithms that achieve Weighted Cost Multipath (WCMP) to balance traffic in the data center based on the changing network topology. The state required for WCMP is already disseminated as part of standard routing protocols and it can be readily implemented in the current switch silicon without any hardware modifications. We show how to deploy WCMP in a production OpenFlow network environment and present experimental and simulation results to show that variation in flow bandwidths can be reduced by as much as 25X by employing WCMP relative to ECMP.

[1]  John Moy,et al.  OSPF Version 2 , 1998, RFC.

[2]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[3]  Curtis Villamizar,et al.  OSPF Optimized Multipath (OSPF-OMP) , 1999 .

[4]  Lionel M. Ni,et al.  Traffic engineering with MPLS in the Internet , 2000, IEEE Netw..

[5]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[6]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[7]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[8]  Edith Cohen,et al.  Coping with network failures: routing strategies for optimal demand oblivious restoration , 2004, SIGMETRICS '04/Performance '04.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Yin Zhang,et al.  Finding critical traffic matrices , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[11]  Sudipta Sengupta,et al.  Efficient and robust routing of highly variable traffic , 2005 .

[12]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[13]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[14]  Haitao Wu,et al.  MDCube: a high performance network structure for modular data center interconnection , 2009, CoNEXT '09.

[15]  Jung Ho Ahn,et al.  HyperX: topology, routing, and packaging of efficient large-scale networks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[16]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[17]  Martín Casado,et al.  Onix: A Distributed Control Platform for Large-scale Production Networks , 2010, OSDI.

[18]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[19]  Sujata Banerjee,et al.  ElasticTree: Saving Energy in Data Center Networks , 2010, NSDI.

[20]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[21]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[22]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM.

[23]  VL2: a scalable and flexible data center network , 2011, Commun. ACM.

[24]  Praveen Yalagandula,et al.  Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection , 2011, 2011 Proceedings IEEE INFOCOM.

[25]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM 2011.

[26]  M. Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[27]  A. Krishnamurthy,et al.  F10: A Fault-Tolerant Engineered Network , 2013, NSDI.

[28]  Thomas E. Anderson,et al.  F10: A Fault-Tolerant Engineered Network , 2013, NSDI.