Algorithm design and application of service-oriented event correlation

The timely and efficient management of faults that affect the quality of services delivered to customers is an important issue for service providers with respect to their business goals. It includes the diagnosis of service faults which deals with the localization of their root causes within subservices and resources being part of the service realization. In this paper our service-oriented event correlation approach, which uses event correlation techniques to automate the diagnosis on the service layer is detailed. Our algorithm for the hybrid rule-based/case-based correlation methodology that also includes recently proposed active probing techniques is presented as well as its prototypical implementation at the Leibniz Supercomputing Center. This implementation is not limited to a small test environment, but has been carried out for requirements of the environment of this large service provider.

[1]  Shiva Shankar,et al.  An Automated System for Analyzing Impact of Faults in IP Telephony Networks , 2006, 2006 IEEE/IFIP Network Operations and Management Symposium NOMS 2006.

[2]  Jean-Philippe Martin-Flatin Distributed Event Correlation and Self-Managed Systems , 2004 .

[3]  G. Jakobson,et al.  Alarm correlation , 1993, IEEE Network.

[4]  Lundy Lewis Managing Computer Networks: A Case-Based Reasoning Approach , 1995 .

[5]  Manish Gupta,et al.  Problem Determination Using Dependency Graphs and Run-Time Behavior Models , 2004, DSOM.

[6]  Lundy Lewis,et al.  Event Correlation in Integrated Management: Lessons Learned and Outlook , 2007, Journal of Network and Systems Management.

[7]  Risto Vaarandi Platform independent event correlation tool for network management , 2002, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[8]  Hanan Lutfiyya,et al.  Diagnosing quality of service faults in distributed applications , 2002, Conference Proceedings of the IEEE International Performance, Computing, and Communications Conference (Cat. No.02CH37326).

[9]  Salvatore J. Stolfo,et al.  A coding approach to event correlation , 1995, Integrated Network Management.

[10]  Malgorzata Steinder,et al.  A survey of fault localization techniques in computer networks , 2004, Sci. Comput. Program..

[11]  J. Buford,et al.  An Approach to Integrated Cognitive Fusion , 2004 .

[12]  G. Jakobson Towards an architecture for reasoning about complex event-based dynamic situations , 2004, ICSE 2004.

[13]  Charles L. Forgy,et al.  Rete: a fast algorithm for the many pattern/many object pattern match problem , 1991 .

[14]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[15]  Irina Rish,et al.  Multi-fault Diagnosis in Dynamic Systems , 2005 .

[16]  Ioannis Hatzilygeroudis,et al.  Categorizing approaches combining rule‐based and case‐based reasoning , 2007, Expert Syst. J. Knowl. Eng..

[17]  Andreas Hanemann A hybrid rule-based/case-based reasoning approach for service fault diagnosis , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[18]  Lundy Lewis,et al.  Service Level Management for Enterprise Networks , 1999 .

[19]  Sheng Ma,et al.  Real-time problem determination in distributed systems using active probing , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).