Network architecture for joint failure recovery and traffic engineering

Today's networks typically handle traffic engineering (e.g., tuning the routing-protocol parameters to optimize the flow of traffic) and failure recovery (e.g., pre-installed backup paths) independently. In this paper, we propose a unified way to balance load efficiently under a wide range of failure scenarios. Our architecture supports flexible splitting of traffic over multiple precomputed paths, with efficient path-level failure detection and automatic load balancing over the remaining paths. We propose two candidate solutions that differ in how the routers rebalance the load after a failure, leading to a trade-off between router complexity and load-balancing performance. We present and solve the optimization problems that compute the configuration state for each router. Our experiments with traffic measurements and topology data (including shared risks in the underlying transport network) from a large ISP identify a "sweet spot" that achieves near-optimal load balancing under a variety of failure scenarios, with a relatively small amount of state in the routers. We believe that our solution for joint traffic engineering and failure recovery will appeal to Internet Service Providers as well as the operators of data-center networks.

[1]  Jeffrey C. Mogul,et al.  SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies , 2010, NSDI.

[2]  Jianping Wang,et al.  Traffic Engineering with AIMD in MPLS Networks , 2002, Protocols for High-Speed Networks.

[3]  Vishal Sharma,et al.  Framework for Multi-Protocol Label Switching (MPLS)-based Recovery , 2003, RFC.

[4]  Jennifer Rexford,et al.  Don't Secure Routing Protocols, Secure Data Delivery , 2006, HotNets.

[5]  Yanghee Choi,et al.  A constrained multipath traffic engineering scheme for MPLS networks , 2002, 2002 IEEE International Conference on Communications. Conference Proceedings. ICC 2002 (Cat. No.02CH37333).

[6]  Eric C. Rosen,et al.  Multiprotocol Label Switching Architecture , 2001, RFC.

[7]  Hao Che,et al.  End-to-End Optimal Algorithms for Integrated QoS, Traffic Engineering, and Failure Recovery , 2007, IEEE/ACM Transactions on Networking.

[8]  Alia Atlas,et al.  Fast Reroute Extensions to RSVP-TE for LSP Tunnels , 2005, RFC.

[9]  Stewart E. Miller,et al.  Optical Fiber Telecommunications , 1979 .

[10]  Yin Zhang,et al.  R3: resilient routing reconfiguration , 2010, SIGCOMM '10.

[11]  Hiroyuki Saito,et al.  Traffic engineering using multiple multipoint-to-point LSPs , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[12]  Brighten Godfrey,et al.  YAMR: yet another multipath routing protocol , 2010, CCRV.

[13]  Yanghee Choi,et al.  Dynamic constrained multipath routing for MPLS networks , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[14]  Srikanth Kandula,et al.  Walking the tightrope: responsive yet stable traffic engineering , 2005, SIGCOMM '05.

[15]  Chen-Nee Chuah,et al.  Characterization of Failures in an Operational IP Backbone Network , 2008, IEEE/ACM Transactions on Networking.

[16]  Antonio Nucci,et al.  IGP Link Weight Assignment for Operational Tier-1 Backbones , 2007, IEEE/ACM Transactions on Networking.

[17]  Chonggang Wang,et al.  Reliable Adaptive Multipath Provisioning with Bandwidth and Differential Delay Constraints , 2010, 2010 Proceedings IEEE INFOCOM.

[18]  Mikkel Thorup,et al.  Optimizing OSPF/IS-IS weights in a changing world , 2002, IEEE J. Sel. Areas Commun..

[19]  Martín Casado,et al.  Dynamic route recomputation considered harmful , 2010, CCRV.

[20]  Yu Liu,et al.  Approximating optimal spare capacity allocation by successive survivable routing , 2005, TNET.

[21]  Mikkel Thorup,et al.  Increasing Internet Capacity Using Local Search , 2004, Comput. Optim. Appl..

[22]  Dongmei Wang,et al.  Efficient distributed bandwidth management for MPLS fast reroute , 2008, TNET.

[23]  Piotr Cholda,et al.  Network Recovery, Protection and Restoration of Optical, SONET-SDH, IP, and MPLS [Book Review] , 2005, IEEE Communications Magazine.

[24]  Bijan Jabbari,et al.  Analytical framework for dynamic traffic partitioning in MPLS networks , 2000, 2000 IEEE International Conference on Communications. ICC 2000. Global Convergence Through Communications. Conference Record.

[25]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[26]  Mateusz Zotkiewicz,et al.  On the complexity of resilient network design , 2010, Networks.

[27]  Hao Che,et al.  Adaptive control algorithms for decentralized optimal traffic engineering in the Internet , 2004, IEEE/ACM Transactions on Networking.

[28]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[29]  Tzi-cker Chiueh,et al.  Viking: a multi-spanning-tree Ethernet architecture for metropolitan area and cluster networks , 2004, IEEE INFOCOM 2004.

[30]  S. Shenker,et al.  Dynamic Route Computation Considered Harmful , 2010 .

[31]  Murali S. Kodialam,et al.  Dynamic routing of restorable bandwidth-guaranteed tunnels using aggregated network resource usage information , 2003, TNET.

[32]  Piet Demeester,et al.  Network Recovery: Protection and Restoration of Optical, SONET-SDH, IP, and MPLS , 2004 .

[33]  Jennifer Rexford,et al.  Stealth Probing: Efficient Data-Plane Security for IP Routing , 2006, USENIX Annual Technical Conference, General Track.

[34]  Amund Kvalbein,et al.  Multipath load-adaptive routing: putting the emphasis on robustness and simplicity , 2009, 2009 17th IEEE International Conference on Network Protocols.

[35]  Eric Osborne,et al.  Traffic Engineering with MPLS , 2002 .

[36]  John E. Hopcroft,et al.  The Directed Subgraph Homeomorphism Problem , 1978, Theor. Comput. Sci..

[37]  Cheng Jin,et al.  MATE: MPLS adaptive traffic engineering , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[38]  Dave Katz,et al.  Bidirectional Forwarding Detection (BFD) , 2010, RFC.

[39]  Hervé Rivano,et al.  Shared Risk Resource Group Complexity and Approximability Issues , 2007, Parallel Process. Lett..