Simple Failure Resilient Load Balancing

To enable reliable data delivery and balance load in the presence of failures, we propose a new mechanism that combines path protection and traffic engineering. The key benefit of our solution is its simplicity, allowing for fast recovery while imposing minimal requirements on the routers. To provide resilience against every failure scenario from a known set, we advocate using a fixed set of parallel end-to-end paths for each traffic demand. Upon detecting a path failure, the ingress router uses a local rule to rebalance the outgoing traffic on the remaining available paths. We describe several candidate rebalancing algorithms, and analyze their performance. Although calculating the optimal set of paths and the path-splitting parameters for each router is NP-hard, our extensive simulations on a tier-1 IP backbone demonstrate that our easy-to-calculate heuristic suffices to achieve nearly optimal load balancing. We believe that a simple-to-implement solution with a fast recovery time, such as ours, will appeal to Internet Service Providers as well as the operators of data centers and enterprise networks.

[1]  Yanghee Choi,et al.  A constrained multipath traffic engineering scheme for MPLS networks , 2002, 2002 IEEE International Conference on Communications. Conference Proceedings. ICC 2002 (Cat. No.02CH37333).

[2]  Piet Demeester,et al.  Network Recovery: Protection and Restoration of Optical, SONET-SDH, IP, and MPLS , 2004 .

[3]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[4]  Hiroyuki Saito,et al.  Traffic engineering using multiple multipoint-to-point LSPs , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[5]  Yanghee Choi,et al.  Dynamic constrained multipath routing for MPLS networks , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[6]  Srikanth Kandula,et al.  Walking the tightrope: responsive yet stable traffic engineering , 2005, SIGCOMM '05.

[7]  Chen-Nee Chuah,et al.  Characterization of Failures in an Operational IP Backbone Network , 2008, IEEE/ACM Transactions on Networking.

[8]  Murali S. Kodialam,et al.  Dynamic routing of restorable bandwidth-guaranteed tunnels using aggregated network resource usage information , 2003, TNET.

[9]  Eric C. Rosen,et al.  Multiprotocol Label Switching Architecture , 2001, RFC.

[10]  Alia Atlas,et al.  Fast Reroute Extensions to RSVP-TE for LSP Tunnels , 2005, RFC.

[11]  Dongmei Wang,et al.  Efficient Distributed Bandwidth Management for MPLS Fast Reroute , 2008, IEEE/ACM Transactions on Networking.

[12]  Hao Che,et al.  End-to-End Optimal Algorithms for Integrated QoS, Traffic Engineering, and Failure Recovery , 2007, IEEE/ACM Transactions on Networking.

[13]  Mikkel Thorup,et al.  Increasing Internet Capacity Using Local Search , 2004, Comput. Optim. Appl..

[14]  Jianping Wang,et al.  Traffic Engineering with AIMD in MPLS Networks , 2002, Protocols for High-Speed Networks.

[15]  Antonio Nucci,et al.  IGP Link Weight Assignment for Operational Tier-1 Backbones , 2007, IEEE/ACM Transactions on Networking.

[16]  Amund Kvalbein,et al.  Multipath load-adaptive routing: putting the emphasis on robustness and simplicity , 2009, 2009 17th IEEE International Conference on Network Protocols.

[17]  Piotr Cholda,et al.  Network Recovery, Protection and Restoration of Optical, SONET-SDH, IP, and MPLS [Book Review] , 2005, IEEE Communications Magazine.

[18]  Bijan Jabbari,et al.  Analytical framework for dynamic traffic partitioning in MPLS networks , 2000, 2000 IEEE International Conference on Communications. ICC 2000. Global Convergence Through Communications. Conference Record.

[19]  Hervé Rivano,et al.  Shared Risk Resource Group Complexity and Approximability Issues , 2007, Parallel Process. Lett..

[20]  Robert D. Doverspike,et al.  Efficient distributed restoration path selection for shared mesh restoration , 2003, TNET.

[21]  Yu Liu,et al.  Approximating optimal spare capacity allocation by successive survivable routing , 2001, IEEE/ACM Transactions on Networking.

[22]  John E. Hopcroft,et al.  The Directed Subgraph Homeomorphism Problem , 1978, Theor. Comput. Sci..

[23]  Cheng Jin,et al.  MATE: MPLS adaptive traffic engineering , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[24]  Vishal Sharma,et al.  Framework for Multi-Protocol Label Switching (MPLS)-based Recovery , 2003, RFC.

[25]  Wai Sum Lai,et al.  Traffic engineering for MPLS , 2002, SPIE ITCom.

[26]  Mateusz Zotkiewicz,et al.  On the complexity of resilient network design , 2010, Networks.

[27]  Hao Che,et al.  Adaptive control algorithms for decentralized optimal traffic engineering in the Internet , 2004, IEEE/ACM Transactions on Networking.

[28]  Mikkel Thorup,et al.  Optimizing OSPF/IS-IS weights in a changing world , 2002, IEEE J. Sel. Areas Commun..

[29]  Chonggang Wang,et al.  Reliable Adaptive Multipath Provisioning with Bandwidth and Differential Delay Constraints , 2010, 2010 Proceedings IEEE INFOCOM.