Dynamic risk-aware routing for OSPF networks

Carrier networks are designed to provide high availability of communication services. Unfortunately, in case of failure, recovery mechanisms are getting involved only after a failure occurrence which cannot prevent a certain impact on traffic flows. However, there are often forewarning signs that a network device will stop working properly. Based on an embedded and real-time risk-level assessment, a proactive fault-management can be performed to isolate failing routers out of the routed topology, and thus totally avoid detrimental impact on the service availability. Our novel approach enables routers to preventively steer traffic away from risky paths by temporally tuning OSPF link cost. The consequences in terms of stability and availability are estimated based on an analytical model and then simulated to measure the expected benefits of the proposed proactive self-healing function. Finally, the functionality has been implemented in an experimental prototype in order to validate the proof of concept.

[1]  Wu-chi Feng,et al.  Achieving faster failure detection in OSPF networks , 2003, IEEE International Conference on Communications, 2003. ICC '03..

[2]  Benny Rochwerger,et al.  2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013), Ghent, Belgium, May 27-31, 2013 , 2013, IM.

[3]  Milton Ohring,et al.  Reliability and Failure of Electronic Materials and Devices, Second Edition , 1998 .

[4]  Miroslaw Malek,et al.  A survey of online failure prediction methods , 2010, CSUR.

[5]  Ludovic Noirie,et al.  GMPLS adaptive level of recovery , 2012, 2012 IEEE International Conference on Communications (ICC).

[6]  Olivier Bonaventure,et al.  Achieving sub-second IGP convergence in large IP networks , 2005, CCRV.

[7]  Bruno Vidalenc,et al.  Proactive fault management based on risk-augmented routing , 2010, 2010 IEEE Globecom Workshops.

[8]  Bruno Vidalenc,et al.  Towards a Unified Architecture for Resilience, Survivability and Autonomic Fault-Management for Self-managing Networks , 2009, ICSOC/ServiceWave Workshops.

[9]  Albert G. Greenberg,et al.  Experience in black-box OSPF measurement , 2001, IMW '01.

[10]  Mikkel Thorup,et al.  Optimizing OSPF/IS-IS weights in a changing world , 2002, IEEE J. Sel. Areas Commun..

[11]  Olivier Bonaventure,et al.  Avoiding Transient Loops During the Convergence of Link-State Routing Protocols , 2007, IEEE/ACM Transactions on Networking.

[12]  Nick Feamster,et al.  Joint analysis of network incidents and intradomain routing changes , 2010, 2010 International Conference on Network and Service Management.

[13]  Miroslaw Malek,et al.  Predicting failures of computer systems: a case study for a telecommunication system , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[14]  Stewart Bryant,et al.  IP Fast Reroute Framework , 2010, RFC.

[15]  Biswanath Mukherjee,et al.  Risk-Aware Provisioning for Optical WDM Mesh Networks , 2011, IEEE/ACM Transactions on Networking.

[16]  Scott Poretsky,et al.  Terminology for Benchmarking Link-State IGP Data-Plane Route Convergence , 2011, RFC.