Semi-Oblivious Traffic Engineering: The Road Not Taken

Networks are expected to provide reliable performance under a wide range of operating conditions, but existing traffic engineering (TE) solutions optimize for performance or robustness, but not both. A key factor that impacts the quality of a TE system is the set of paths used to carry traffic. Some systems rely on shortest paths, which leads to excessive congestion in topologies with bottleneck links, while others use paths that minimize congestion, which are brittle and prone to failure. This paper presents a system that uses a set of paths computed using Räcke’s oblivious routing algorithm, as well as a centralized controller to dynamically adapt sending rates. Although oblivious routing and centralized TE have been studied previously in isolation, their combination is novel and powerful. We built a software framework to model TE solutions and conducted extensive experiments across a large number of topologies and scenarios, including the production backbone of a large content provider and an ISP. Our results show that semi-oblivious routing provides near-optimal performance and is far more robust than state-of-the-art systems.

[1]  Mohammad Taghi Hajiaghayi,et al.  Semi-oblivious routing: lower bounds , 2007, SODA '07.

[2]  Harald Räcke,et al.  Minimizing Congestion in General Networks , 2002, FOCS.

[3]  Xin Jin,et al.  Dynamic scheduling of network updates , 2014, SIGCOMM.

[4]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[5]  Harald Räcke,et al.  Optimal hierarchical decompositions for congestion minimization in networks , 2008, STOC.

[6]  Srikanth Kandula,et al.  Walking the tightrope: responsive yet stable traffic engineering , 2005, SIGCOMM '05.

[7]  Hongyi Zeng,et al.  Robotron: Top-down Network Management at Facebook Scale , 2016, SIGCOMM.

[8]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[9]  Edith Cohen,et al.  Making intra-domain routing robust to changing and uncertain traffic demands: understanding fundamental tradeoffs , 2003, SIGCOMM '03.

[10]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[11]  Albert G. Greenberg,et al.  Experience in measuring backbone traffic variability: models, metrics, measurements and meaning , 2002, IMW '02.

[12]  Jochen Könemann,et al.  Faster and simpler algorithms for multicommodity flow and other fractional packing problems , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[13]  Leslie G. Valiant,et al.  A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..

[14]  Srikanth Kandula,et al.  Traffic engineering with forward fault correction , 2014, SIGCOMM.

[15]  David Johnson,et al.  Network architecture for joint failure recovery and traffic engineering , 2011, SIGMETRICS '11.

[16]  R. J. Williams,et al.  Fluid model for a network operating under a fair bandwidth-sharing policy , 2004, math/0407057.

[17]  Éva Tardos,et al.  A Strongly Polynomial Algorithm to Solve Combinatorial Linear Programs , 1986, Oper. Res..

[18]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[19]  Jean-Louis Le Roux,et al.  Path Computation Element (PCE) Communication Protocol (PCEP) , 2009, RFC.

[20]  Ramesh Govindan,et al.  Evolve or Die: High-Availability Design Principles Drawn from Googles Network Infrastructure , 2016, SIGCOMM.

[21]  Mohit Tawarmalani,et al.  Robust Validation of Network Designs under Uncertain Demands and Failures , 2017, NSDI.

[22]  Cheng Jin,et al.  MATE: MPLS adaptive traffic engineering , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[23]  Robert Soulé,et al.  YATES: Rapid Prototyping for Traffic Engineering Systems , 2018, SOSR.

[24]  Chen-Nee Chuah,et al.  Characterization of Failures in an Operational IP Backbone Network , 2008, IEEE/ACM Transactions on Networking.

[25]  Mikkel Thorup,et al.  Internet traffic engineering by optimizing OSPF weights , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[26]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[27]  Debasis Mitra,et al.  A case study of multiservice, multipriority traffic engineering design for data networks , 1999, Seamless Interconnection for Universal Services. Global Telecommunications Conference. GLOBECOM'99. (Cat. No.99CH37042).

[28]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[29]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[30]  Nick McKeown,et al.  A network in a laptop: rapid prototyping for software-defined networks , 2010, Hotnets-IX.

[31]  Marco Chiesa,et al.  Lying Your Way to Better Traffic Engineering , 2016, CoNEXT.

[32]  Bernard Fortz,et al.  Oblivious OSPF routing with weight optimization under polyhedral demand uncertainty , 2009, Networks.

[33]  Amin Vahdat,et al.  BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing , 2015, Comput. Commun. Rev..

[34]  Thomas R. Henderson,et al.  Network Simulations with the ns-3 Simulator , 2008 .

[35]  Marco Chiesa,et al.  Traffic engineering with Equal-Cost-Multipath: An algorithmic perspective , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[36]  Mikkel Thorup,et al.  Traffic engineering with traditional IP routing protocols , 2002, IEEE Commun. Mag..

[37]  Nick McKeown,et al.  Designing a Fault-Tolerant Network Using Valiant Load-Balancing , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[38]  Yin Zhang,et al.  COPE: traffic engineering in dynamic networks , 2006, SIGCOMM 2006.

[39]  A. Varga,et al.  THE OMNET++ DISCRETE EVENT SIMULATION SYSTEM , 2003 .

[40]  Xinjie Chang Network simulations with OPNET , 1999, WSC'99. 1999 Winter Simulation Conference Proceedings. 'Simulation - A Bridge to the Future' (Cat. No.99CH37038).

[41]  Anja Feldmann,et al.  REPLEX: dynamic traffic engineering based on wardrop routing policies , 2006, CoNEXT '06.