A practical scheme for MPLS fault monitoring and alarm correlation in backbone networks

As current backbone network evolution involves replacing today's multiple networks with a single global multi-protocol label switching (MPLS)-enabled backbone over an intelligent optical IP-based core network, fault management system (FMS) becomes critical for network service providers to monitor network health, performance, and to quickly identify and resolve operational problems.In this paper, we present a practical scheme for the fault management of MPLS-enabled backbone networks. First, we describe a hierarchical fault management architecture that scales well to large backbone networks. Then, we present an OAM tool, called MPLS Connectivity Monitor (CMON), to monitor MPLS operation and generate MPLS alarms. After that, we propose a hybrid technique to efficiently correlate MPLS alarms to other equipment and service alarms, including event aggregation, rule-based method, and codebook approach. Finally, we report our testing result obtained from a large-scale backbone network to demonstrate the effectiveness of the proposed scheme.

[1]  Masum Z. Hasan,et al.  The Management of Data, Events, and Information Presentation for Network Management , 1996 .

[2]  Bharat K. Bhargava,et al.  On detecting service violations and bandwidth theft in QoS network domains , 2003, Comput. Commun..

[3]  Chuanyi Ji,et al.  Proactive network fault detection , 1997, Proceedings of INFOCOM '97.

[4]  Yves Raynaud,et al.  Integrated Network Management IV , 1995, IFIP — The International Federation for Information Processing.

[5]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[6]  L. Lewis,et al.  A case-based reasoning approach to the management of faults in communications networks , 1993, Proceedings of 9th IEEE Conference on Artificial Intelligence for Applications.

[7]  R. Ogier,et al.  1 An Artificial Intelligence Approach to Network Fault Management ‡ , 2007 .

[8]  G. Jakobson,et al.  Alarm correlation , 1993, IEEE Network.

[9]  Heikki Mannila,et al.  Rule Discovery in Telecommunication Alarm Data , 1999, Journal of Network and Systems Management.

[10]  Boris Gruschke,et al.  INTEGRATED EVENT MANAGEMENT: EVENT CORRELATION USING DEPENDENCY GRAPHS , 1998 .

[11]  Manish Gupta,et al.  Discovering Dynamic Dependencies in Enterprise Environments for Problem Determination , 2003, DSOM.

[12]  Thomas D. Nadeau MPLS Network Management: MIBs, Tools, and Techniques , 2003 .

[13]  Mani Subramanian,et al.  Preprocessor Algorithm for Network Management Codebook , 1999, Workshop on Intrusion Detection and Network Monitoring.

[14]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[15]  Mark Weissman,et al.  Real-time telecommunication network management: extending event correlation with temporal constraints , 1995, Integrated Network Management.

[16]  Rolf Stadler,et al.  Integrated Network Management V , 1997, IFIP — The International Federation for Information Processing.

[17]  Lundy M. Lewis,et al.  A case-based reasoning approach to the management of faults in communication networks , 1993, IEEE INFOCOM '93 The Conference on Computer Communications, Proceedings.

[18]  C. S. Hood,et al.  Proactive network-fault detection [telecommunications] , 1997 .

[19]  Guangtian Liu,et al.  Composite events for network event correlation , 1999, Integrated Network Management VI. Distributed Management for the Networked Millennium. Proceedings of the Sixth IFIP/IEEE International Symposium on Integrated Network Management. (Cat. No.99EX302).

[20]  Ramesh Viswanathan,et al.  A conceptual framework for network management event correlation and filtering systems , 1999, Integrated Network Management VI. Distributed Management for the Networked Millennium. Proceedings of the Sixth IFIP/IEEE International Symposium on Integrated Network Management. (Cat. No.99EX302).

[21]  Salvatore J. Stolfo,et al.  A coding approach to event correlation , 1995, Integrated Network Management.

[22]  Martin Sailer,et al.  Assured service quality by improved fault management , 2004, ICSOC '04.

[23]  Jean-francois Huard Probabilistic Reasoning for Fault Management on XUNET , 1994 .

[24]  Martin Sailer,et al.  Assured Service Quality by Improved Fault Management Service-Oriented Event Correlation , 2004 .

[25]  Yossi A. Nygate,et al.  Event correlation using rule and object based techniques , 1995, Integrated Network Management.

[26]  Uyless D. Black Network Management Standards: SNMP, CMIP, TMN, MIBs and Object Libraries , 1992 .

[27]  Martha W. Evens,et al.  A Framework for Event Correlation in Communication Systems , 2001, MMNS.

[28]  Paul R. Cohen,et al.  Automatically Acquiring Rules for Event Correlation from Event Logs , 1997 .

[29]  Roy Sterritt,et al.  Discovering rules for fault management , 2001, Proceedings. Eighth Annual IEEE International Conference and Workshop On the Engineering of Computer-Based Systems-ECBS 2001.

[30]  Shaula Yemini,et al.  Event Modeling with the MODEL Language , 1997, Integrated Network Management.

[31]  William Stallings,et al.  SNMP, SNMPv2, SNMPv3, and RMON 1 and 2 , 1999 .

[32]  Malgorzata Steinder,et al.  Probabilistic fault localization in communication systems using belief networks , 2004, IEEE/ACM Transactions on Networking.