MDFM: Multi-domain Fault Management for Internet Services

New requirements of service-oriented fault management are analyzed and a framework MDFM (Multi-Domain Fault Manager) is proposed in this paper to solve the service fault localization problem in multi-domain context. Different from current solutions, our approach decomposes SLS (Service Level Specification) based on network capability, and monitor service performance in each domain along the end-to-end path. As a result, MDFM can localize the approximate domain rapidly on which the root cause resides, therefore causative region is narrowed down and computation cost for fault analysis is reduced. Faults on both server and client sides are considered in MDFM. A prototype has been implemented to prove the feasibility and efficiency of our service fault management framework.

[1]  Martin Sailer,et al.  Assured service quality by improved fault management , 2004, ICSOC '04.

[2]  Martin Mueller,et al.  Self-aware services: using Bayesian networks for detecting anomalies in Internet-based services , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[3]  Igor Radisic,et al.  Service Oriented Application Management - Do Current Techniques Meet the Requirements? , 2001, DAIS.

[4]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[5]  Srinivas Ramanathan,et al.  Using service models for management of Internet services , 2000, IEEE Journal on Selected Areas in Communications.

[6]  Chuck Darst,et al.  Measurement and management of Internet services , 1999, Integrated Network Management VI. Distributed Management for the Networked Millennium. Proceedings of the Sixth IFIP/IEEE International Symposium on Integrated Network Management. (Cat. No.99EX302).

[7]  Cheng Shiduan,et al.  PDB-based SLS decomposition in heterogeneous IP network , 2004, 2004 IEEE International Workshop on IP Operations and Management.

[8]  Malgorzata Steinder,et al.  The present and future of event correlation: A need for end-to-end service fault localization , 2001 .

[9]  Martin Sailer,et al.  Assured Service Quality by Improved Fault Management Service-Oriented Event Correlation , 2004 .

[10]  Brian E. Carpenter,et al.  Definition of Differentiated Services Per Domain Behaviors and Rules for their Specification , 2001, RFC.

[11]  Malgorzata Steinder,et al.  Multi-domain Diagnosis of End-to-End Service Failures in Hierarchically Routed Networks , 2004, NETWORKING.

[12]  Graham Chen,et al.  A management framework for Internet services , 1998, NOMS 98 1998 IEEE Network Operations and Management Symposium.

[13]  Malgorzata Steinder,et al.  Probabilistic fault localization in communication systems using belief networks , 2004, IEEE/ACM Transactions on Networking.