Reducing the Inaccuracy Caused by Inappropriate Time Window in Probabilistic Fault Localization

To reduce the inaccuracy caused by inappropriate time window, we propose two probabilistic fault localization schemes based on the idea of “extending time window.” The global window extension algorithm (GWE) uses a window extension strategy for all candidate faults, while the on-demand window extension algorithm (OWE) uses the extended window only for a small set of faults when necessary. Both algorithms can increase the metric values of actual faults and thus improve the accuracy of fault localization. Simulation results show that both schemes perform better than existing algorithms. Furthermore, OWE performs better than GWE at the cost of a bit more computing time.

[1]  Gianluca Reali,et al.  Fault localization in data networks , 2009, IEEE Communications Letters.

[2]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[3]  Athina Markopoulou,et al.  Characterization of failures in an IP backbone , 2004, IEEE INFOCOM 2004.

[4]  Xiaomin Zhu,et al.  SWPM: An Incremental Fault Localization Algorithm Based on Sliding Window with Preprocessing Mechanism , 2008, 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies.

[5]  Albert G. Greenberg,et al.  IP fault localization via risk modeling , 2005, NSDI.

[6]  Lakshminarayanan Subramanian,et al.  A root cause localization model for large scale systems , 2005, INFOCOM 2005.

[7]  Chuanyi Ji,et al.  Scalable Fault Diagnosis in IP Networks using Graphical Models: A Variational Inference Approach , 2007, 2007 IEEE International Conference on Communications.

[8]  Mani Subramanian,et al.  Preprocessor Algorithm for Network Management Codebook , 1999, Workshop on Intrusion Detection and Network Monitoring.

[9]  Biswanath Mukherjee,et al.  A review of fault management in WDM mesh networks: basic concepts and research challenges , 2004, IEEE Netw..

[10]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[11]  Ehab Al-Shaer,et al.  Active integrated fault localization in communication networks , 2005, 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005..

[12]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[13]  Xiao-Hui Huang Internet Services Fault Management: Layering Model and Algorithm , 2007 .

[14]  Malgorzata Steinder,et al.  Probabilistic fault localization in communication systems using belief networks , 2004, IEEE/ACM Transactions on Networking.

[15]  Malgorzata Steinder,et al.  End-to-end service failure diagnosis using belief networks , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[16]  Malgorzata Steinder,et al.  A survey of fault localization techniques in computer networks , 2004, Sci. Comput. Program..

[17]  Carmen Mas Machuca,et al.  An efficient algorithm for locating soft and hard failures in WDM networks , 2000, IEEE Journal on Selected Areas in Communications.

[18]  Patrick Thiran,et al.  A review on Fault Location Methods and their application to optical networks , 2001 .

[19]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20]  Malgorzata Steinder,et al.  Probabilistic event-driven fault diagnosis through incremental hypothesis updating , 2003 .

[21]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[22]  Xiaomin Zhu,et al.  Probabilistic Event-Driven Heuristic Fault Localization using Incremental Bayesian Suspected Degree , 2008, 2008 The 9th International Conference for Young Computer Scientists.