Proactive fault management based on risk-augmented routing

Carrier networks need to provide their customers with high availability of communication services. Unfortunately, failures are managed by recovery mechanisms getting involved only after the failure occurrence to limit the impact on traffic flows. However, there are often forewarning signs that a network device will stop working properly. We propose to take into account this risk exposure in order to improve the performance of the existing restoration mechanisms, in particular for IP networks. Based on an embedded and real-time risk-level assessment, we can perform a proactive fault-management and isolate the failing routers out of the routed topology, and thus totally avoid service unavailability. Our novel approach enables routers to preventively steer traffic away from risky paths by temporally tuning OSPF link cost.