A Method of Building the Fault Propogation Model of Distributed Application Systems Based on Bayesian Network

Fault diagnosis is a key research part in the field of network fault management. In order to make effective fault diagnosis to the increasingly complicated distributed application systems(DAS) which are based on the computer network, Building an accurate and practicable Fault Propagation Model(FPM) is generally the necessary prerequisite of the subsequent tasks such as probabilistic reasoning, fault recovery and failure prediction. In this paper, a method of constructing the FPM which combined sample datas and the expert knowledge was put forward based on bayesian network. Firstly, an initial tree(T) including all the service nodes on the specific DAS was generated by the Maximum Weight Spanning Tree(MWST) algorithm with sample datas. Secondly, the initial tree(T) was revised according to expert experiences. Finally, the FPM of the DAS was learned using Greedy Search structure-learning algorithm with the revised structure(T’) as its initial input model. In the end, the learned FPM using the proposed method was evaluated by calculating its BIC-score and comparing to the actual one. And the results show that the proposed method can give an accurate FPM of the distributed application system.

[1]  Malgorzata Steinder,et al.  End-to-end service failure diagnosis using belief networks , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[2]  Xiaohui Huang,et al.  Fault management for Internet Services: Modeling and Algorithms , 2006, 2006 IEEE International Conference on Communications.

[3]  Albert G. Greenberg,et al.  IP fault localization via risk modeling , 2005, NSDI.

[4]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[5]  Sheng Ma,et al.  Intelligent probing: A cost-effective approach to fault diagnosis in computer networks , 2002, IBM Syst. J..

[6]  Malgorzata Steinder,et al.  A survey of fault localization techniques in computer networks , 2004, Sci. Comput. Program..

[7]  D. C. Wilkins,et al.  Stochastic Greedy Search: Efficiently Computing a Most Probable Explanation in Bayesian Networks , 2000 .

[8]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[9]  Malgorzata Steinder,et al.  Non-deterministic diagnosis of end-to-end service failures in a multi-layer communication system , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[10]  Sheng Ma,et al.  Adaptive diagnosis in distributed systems , 2005, IEEE Transactions on Neural Networks.

[11]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[12]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.