A graph-based proactive fault identification approach in computer networks

In large-scale computer networks, the isolation of the primary failure source is a challenging task. This article presents a proactive network fault diagnosis approach based on graph theory. Compared with other approaches, the manager of network management system checks the status of the managed devices actively rather than receive messages from those objects passively. The salient feature of this approach is that the possible failure sources, including the real one, can be computed precisely and quickly without any alarm historical information or strict assumptions. This approach does not introduce much processing complexity by taking full use of matrix and Boolean operations. To test and evaluate our proposed algorithm, it is implemented in Java and tested in a real large network environment. The experiment results show that this approach is not only efficient but also scalable on fault identification in large-scale computer networks.

[1]  Don-Lin Yang,et al.  A LAN fault diagnosis system , 2001, Comput. Commun..

[2]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[3]  Tony White,et al.  Mobile agents for network management , 1998, IEEE Communications Surveys & Tutorials.

[4]  Ramesh Viswanathan,et al.  A conceptual framework for network management event correlation and filtering systems , 1999, Integrated Network Management VI. Distributed Management for the Networked Millennium. Proceedings of the Sixth IFIP/IEEE International Symposium on Integrated Network Management. (Cat. No.99EX302).

[5]  Chi-Chun Lo,et al.  Coding-based schemes for fault identification in communication networks , 2000, Int. J. Netw. Manag..

[6]  John S. Baras,et al.  An Automated, Distributed, Intelligent Fault Management System for Communication Networks , 1999 .

[7]  N. D. Rao,et al.  Artificial neural network based fault diagnostic system for electric power distribution feeders , 1995 .

[8]  Tony White,et al.  Distributed Fault Location in Networks Using Mobile Agents , 1999, IATA.

[9]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[10]  Chuanyi Ji,et al.  Proactive network fault detection , 1997, Proceedings of INFOCOM '97.

[11]  Radu State,et al.  J/sup TMN/: a Java-based TMN development and experimentation environment , 2000, IEEE Journal on Selected Areas in Communications.

[12]  Seraphin B. Calo,et al.  Alarm correlation and fault identification in communication networks , 1994, IEEE Trans. Commun..

[13]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[14]  C. S. Hood,et al.  Proactive network-fault detection [telecommunications] , 1997 .

[15]  Yechiam Yemini,et al.  NESTOR: an architecture for network self-management and organization , 2000, IEEE Journal on Selected Areas in Communications.

[16]  Albert Benveniste,et al.  A Petri net approach to fault detection and diagnosis in distributed systems. II. Extending Viterbi algorithm and HMM techniques to Petri nets , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[17]  Craig Partridge,et al.  Smart packets: applying active networks to network management , 2000, TOCS.

[18]  Debao Xiao,et al.  A novel automated fault identification approach in computer networks based on graph theory , 2003, International Conference on Communication Technology Proceedings, 2003. ICCT 2003..

[19]  A. Benveniste,et al.  A Petri net approach to fault detection and diagnosis in distributed systems. I. Application to telecommunication networks, motivations, and modelling , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[20]  C. S. Chao,et al.  An Automated Fault Diagnosis System Using Hierarchical Reasoning and Alarm Correlation , 2004, Journal of Network and Systems Management.

[21]  Aiko Pras,et al.  Proceedings of the 9th IFIP/IEEE International Symposium on Integrated Network Management , 2005 .

[22]  Deborah Estrin,et al.  Large-scale fault isolation , 2000, IEEE Journal on Selected Areas in Communications.

[23]  Claude Jard,et al.  Fault Detection in Telecommunication Networks Based on a Petri Net Representation of Alarm Propagation , 1997, ICATPN.