Discovering Dynamic Dependencies in Enterprise Environments for Problem Determination

In order to reduce mean time to recovery (MTTR) in heterogeneous enterprise environments it should be possible to easily and quickly determine the root cause of a problem detected at a higher level, e.g. through response time violation of a transaction category, and resolve it. Many problem determination applications use a component dependency graph to pinpoint the root cause. However, such graphs are often manually constructed. This paper introduces a simple non-intrusive technique based on mining of existing runtime monitored data, to construct a dynamic dependency graph between the components of an enterprise environment. The graph is traversed to identify nodes that are the cause of response time related problems.

[1]  Jaesung Choi,et al.  An alarm correlation and fault identification scheme based on OSI managed object classes , 1999, 1999 IEEE International Conference on Communications (Cat. No. 99CH36311).

[2]  Donna N. Dillenberger,et al.  Adaptive Algorithms for Managing a Distributed Data Processing Workload , 1997, IBM Syst. J..

[3]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[4]  Uri Blumenthal,et al.  Classification and computation of dependencies for distributed management , 2000, Proceedings ISCC 2000. Fifth IEEE Symposium on Computers and Communications.

[5]  Fabio Kon,et al.  Dependence management in component-based distributed systems , 2000, IEEE Concurr..

[6]  Boris Gruschke,et al.  INTEGRATED EVENT MANAGEMENT: EVENT CORRELATION USING DEPENDENCY GRAPHS , 1998 .

[7]  Joel A. Farrell,et al.  Web servces management approaches , 2002, IBM Syst. J..

[8]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[9]  Joseph L. Hellerstein,et al.  Mining Event Data for Actionable Patterns , 2000, Int. CMG Conference.

[10]  Adarshpal S. Sethi,et al.  Multi-layer Fault Localization Using Probabilistic Inference in Bipartite Dependency Graphs , 2001 .

[11]  Michael Anthony Bauer,et al.  Making distributed applications manageable through instrumentation , 1999, J. Syst. Softw..

[12]  Christian Ensel New Approach for Automated Generation of Service Dependency Models , 2001, LANOMS.

[13]  Stefan Kätker,et al.  Fault Isolation and Event Correlation for Integrated Fault Management , 1997, Integrated Network Management.

[14]  Saurabh Bagchi,et al.  Dependency Analysis in Distributed Systems using Fault Injection: Application to Problem Determination in an e-commerce Environment , 2001, DSOM.

[15]  Michael Anthony Bauer,et al.  Making distributed applications manageable through instrumentation , 1997, Proceedings of PDSE '97: 2nd International Workshop on Software Engineering for Parallel and Distributed Systems.

[16]  Aaron B. Brown,et al.  An active approach to characterizing dynamic dependencies for problem determination in a distributed environment , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[17]  Peer Hasselmeyer,et al.  Managing Dynamic Service Dependencies , 2001, DSOM.

[18]  Joseph L. Hellerstein,et al.  Event relationship networks: a framework for action oriented analysis in event management , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).