Internet Services Fault Management: Layering Model and Algorithm

In window-based Internet service fault management,improper time window size setting will affect the fault diagnosis algorithm. In order to reduce the impact,challenges of Internet service fault management are analyzed in this paper,and a layering model is recommended. Bipartite graph is chosen to be the fault propagation model (FPM) for each layer. A window-based fault diagnosis algorithm MFD (multi-window fault diagnosis) is proposed for the bipartite FPM. MFD takes the correlation of adjacent time windows into account. As a result,it can reduce the impact of improper time window size setting. Simulation results prove the validity and efficiency of MFD.

[1]  Rajeev Gopal,et al.  Layered model for supporting fault isolation and recovery , 2000, NOMS 2000. 2000 IEEE/IFIP Network Operations and Management Symposium 'The Networked Planet: Management Beyond 2000' (Cat. No.00CB37074).

[2]  Carmen Mas Machuca,et al.  An efficient algorithm for locating soft and hard failures in WDM networks , 2000, IEEE Journal on Selected Areas in Communications.

[3]  G. Jakobson,et al.  Alarm correlation , 1993, IEEE Network.

[4]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[5]  Liviu Iftode,et al.  Migratory TCP: connection migration for service continuity in the Internet , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[6]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[7]  Lundy M. Lewis,et al.  A Case-Based Reasoning Approach to the Resolution of Faults in Communication Networks , 1993, Integrated Network Management.

[8]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[9]  Liviu Iftode,et al.  Recovering Internet service sessions from operating system failures , 2005, IEEE Internet Computing.

[10]  Malgorzata Steinder,et al.  Probabilistic event-driven fault diagnosis through incremental hypothesis updating , 2003, IFIP/IEEE Eighth International Symposium on Integrated Network Management, 2003..

[11]  Armando Fox,et al.  Detecting application-level failures in component-based Internet services , 2005, IEEE Transactions on Neural Networks.

[12]  Malgorzata Steinder,et al.  Probabilistic fault localization in communication systems using belief networks , 2004, IEEE/ACM Transactions on Networking.

[13]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.

[14]  Aaron B. Brown,et al.  An active approach to characterizing dynamic dependencies for problem determination in a distributed environment , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[15]  Tao Yang,et al.  Dependency isolation for thread-based multi-tier Internet services , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[16]  R. Martin Chavez,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Xiaohui Huang,et al.  MDFM: Multi-domain Fault Management for Internet Services , 2005, MMNS.

[18]  Saurabh Bagchi,et al.  Dependency Analysis in Distributed Systems using Fault Injection: Application to Problem Determination in an e-commerce Environment , 2001, DSOM.

[19]  Xiaohui Huang,et al.  Fault management for Internet Services: Modeling and Algorithms , 2006, 2006 IEEE International Conference on Communications.

[20]  Malgorzata Steinder,et al.  End-to-end service failure diagnosis using belief networks , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).