Fusion Based Approach for Distributed Alarm Correlation in Computer Networks

We propose a new distributed alarm correlation and fault identification in computer networks. The managed network is divided into a disjoint management domains and each management domain is assigned a dedicated intelligent agent. The intelligent agent is responsible for collecting, analyzing, and correlating alarms emitted form emitted from its constituent entities in its domain. In the framework of Dempster-Shafer evidence theory, each agent perceives each alarm as a piece of evidence in the occurrence of a certain fault hypothesis and correlates the received alarms into a single alarm called local composite alarm, which encapsulates the agent’s partial view of the current status of the managed system. While the alarm correlation process is performed locally, each intelligent agent is able to correlate its alarms globally. These local composite alarms are, in turn, sent to a higher agent whose task is to fuse these alarms and form a global view of operation status of the running network. Extensive experimentations have demonstrated that the proposed approach is more alarm loss tolerant than the codebook based approaches and hence shown its effectiveness in a usually noisy network environment.

[1]  Otman A. Basir,et al.  A scheme for constructing evidence structures in Dempster-Shafer evidence theory for data fusion , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[3]  Shaula Yemini,et al.  Event Modeling with the MODEL Language , 1997, Integrated Network Management.

[4]  Pei-Hwa Huang,et al.  A fuzzy expert system for network fault management , 1996, 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No.96CH35929).

[5]  Rajeev Gopal,et al.  Layered model for supporting fault isolation and recovery , 2000, NOMS 2000. 2000 IEEE/IFIP Network Operations and Management Symposium 'The Networked Planet: Management Beyond 2000' (Cat. No.00CB37074).

[6]  Isabelle Rouvellou,et al.  Automatic alarm correlation for fault identification , 1995, Proceedings of INFOCOM'95.

[7]  Robert H. Deng,et al.  A Probabilistic Approach to Fault Diagnosis in Linear Lightware Networks , 1993, IEEE J. Sel. Areas Commun..

[8]  Malgorzata Steinder,et al.  Probabilistic fault diagnosis in communication systems through incremental hypothesis updating , 2004, Comput. Networks.

[9]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[10]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[11]  Peter Schefczik,et al.  LUCAS - an expert system for intelligent fault management and alarm correlation , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[12]  Guang-Hui Xu,et al.  A New Network Management Framework Design and Application Realization , 2005, Sixth International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT'05).

[13]  Ramesh Viswanathan,et al.  A conceptual framework for network management event correlation and filtering systems , 1999, Integrated Network Management VI. Distributed Management for the Networked Millennium. Proceedings of the Sixth IFIP/IEEE International Symposium on Integrated Network Management. (Cat. No.99EX302).

[14]  Ming Yu,et al.  A practical scheme for MPLS fault monitoring and alarm correlation in backbone networks , 2006, Comput. Networks.

[15]  Martha W. Evens,et al.  A Framework for Event Correlation in Communication Systems , 2001, MMNS.

[16]  Charles E. Hughes,et al.  Interconnections , 2011 .

[17]  Salvatore J. Stolfo,et al.  A coding approach to event correlation , 1995, Integrated Network Management.

[18]  Don-Lin Yang,et al.  A LAN fault diagnosis system , 2001, Comput. Commun..

[19]  G. Jakobson,et al.  Alarm correlation , 1993, IEEE Network.

[20]  Robert D. Gardner,et al.  Alarm correlation and network fault resolution using the Kohonen self-organising map , 1997, GLOBECOM 97. IEEE Global Telecommunications Conference. Conference Record.

[21]  B. Dang,et al.  Interconnections, second edition: bridges, routers, switches, and internetworking protocols [Bookshelf] , 2000, IEEE Software.

[22]  C. S. Chao,et al.  An Automated Fault Diagnosis System Using Hierarchical Reasoning and Alarm Correlation , 2004, Journal of Network and Systems Management.

[23]  Chuanyi Ji,et al.  Intelligent Agents for Proactive Fault Detection , 1998, IEEE Internet Comput..

[24]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[25]  Seraphin B. Calo,et al.  Distributed fault identification in telecommunication networks , 2005, Journal of Network and Systems Management.

[26]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.