Scalable Multi-Class Traffic Management in Data Center Backbone Networks

Large online service providers (OSPs) often build private backbone networks to interconnect data centers in multiple locations. These data centers house numerous applications that produce multiple classes of traffic with diverse performance objectives. Applications in the same class may also have differences in relative importance to the OSP's core business. By controlling both the hosts and the routers, an OSP can perform both application rate-control and network routing. However, centralized management of both rates and routes does not scale due to excessive message-passing between the hosts, routers, and management systems. Similarly, fully-distributed approaches do not scale and converge slowly. To overcome these issues, we investigate two semi-centralized designs that lie at practical points along the spectrum between fully-distributed and fully-centralized solutions. We achieve scalability by distributing computation across multiple tiers of an optimization machinery. Our first design uses two tiers, representing the backbone and classes, to compute class-level link bandwidths and application sending rates. Our second design has an additional tier representing individual data centers. Using optimization, we show that both designs provably maximize the aggregate utility over all traffic classes. Simulations on realistic backbones show that the 3-tier design is more scalable, but converges slower than the 2-tier design.

[1]  Van Jacobson,et al.  Link-sharing and resource management models for packet networks , 1995, TNET.

[2]  Frank Kelly,et al.  Rate control for communication networks: shadow prices, proportional fairness and stability , 1998, J. Oper. Res. Soc..

[3]  Frank Kelly,et al.  Fairness and Stability of End-to-End Congestion Control , 2003, Eur. J. Control.

[4]  A. Robert Calderbank,et al.  Layering as Optimization Decomposition: A Mathematical Theory of Network Architectures , 2007, Proceedings of the IEEE.

[5]  Srikanth Kandula,et al.  Dynamic load balancing without packet reordering , 2007, CCRV.

[6]  Daniel Pérez Palomar,et al.  Alternative Distributed Algorithms for Network Utility Maximization: Framework and Applications , 2007, IEEE Transactions on Automatic Control.

[7]  Mung Chiang,et al.  Rethinking internet traffic management: from multiple decompositions to a practical protocol , 2007, CoNEXT '07.

[8]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[9]  Ying Li,et al.  DaVinci: dynamically adaptive virtual networks for a customized internet , 2008, CoNEXT '08.

[10]  Martin Suchara,et al.  Multipath protocol for delay-sensitive traffic , 2009, 2009 First International Communication Systems and Networks and Workshops.

[11]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[12]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[13]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[14]  Albert G. Greenberg,et al.  Optimizing Cost and Performance in Online Service Provider Networks , 2010, NSDI.

[15]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[16]  Ajay Mahimkar,et al.  Bandwidth on demand for inter-data center communication , 2011, HotNets-X.

[17]  Zhi-Li Zhang,et al.  A first look at inter-data center traffic characteristics via Yahoo! datasets , 2011, 2011 Proceedings IEEE INFOCOM.

[18]  J. Rexford,et al.  Network architecture for joint failure recovery and traffic engineering , 2011, PERV.

[19]  Joe Wenjie Jiang,et al.  Wide-Area Traffic Management for Cloud Services , 2012 .

[20]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.