Approaches to building self healing systems using dependency analysis

Typical distributed transaction environments are a heterogeneous collection of hardware and software resources. An example of such an environment is an electronic store front where users can launch a number of different transactions to complete one or more interactions with the system. One of the challenges in managing such an environment is to figure out the root cause of a performance or throughput problem that manifests itself at a user access point, and to take appropriate action, preferably in an automated way. Our paper addresses this problem by analyzing the dependency relationship among various software components. We also provide a theoretical insight into how a set of transactions can be generated to pinpoint the root cause of a performance problem that is manifested at the user access point.

[1]  Christian Ensel New Approach for Automated Generation of Service Dependency Models , 2001, LANOMS.

[2]  Paolo Toth,et al.  An exact algorithm for the subset sum problem , 2002, Eur. J. Oper. Res..

[3]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[4]  Uri Blumenthal,et al.  Classification and computation of dependencies for distributed management , 2000, Proceedings ISCC 2000. Fifth IEEE Symposium on Computers and Communications.

[5]  David S. Johnson,et al.  Approximation algorithms for combinatorial problems , 1973, STOC.

[6]  Salvatore J. Stolfo,et al.  A coding approach to event correlation , 1995, Integrated Network Management.

[7]  Boris Gruschke,et al.  INTEGRATED EVENT MANAGEMENT: EVENT CORRELATION USING DEPENDENCY GRAPHS , 1998 .

[8]  Manish Gupta,et al.  Discovering Dynamic Dependencies in Enterprise Environments for Problem Determination , 2003, DSOM.

[9]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[10]  Saurabh Bagchi,et al.  Dependency Analysis in Distributed Systems using Fault Injection: Application to Problem Determination in an e-commerce Environment , 2001, DSOM.

[11]  Mark S. Squillante,et al.  Analysis and characterization of large‐scale Web server access patterns and performance , 1999, World Wide Web.

[12]  Aaron B. Brown,et al.  An active approach to characterizing dynamic dependencies for problem determination in a distributed environment , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[13]  John W. Sheppard,et al.  Improving the accuracy of diagnostics provided by fault dictionaries , 1996, Proceedings of 14th VLSI Test Symposium.

[14]  Mani Subramanian,et al.  Preprocessor Algorithm for Network Management Codebook , 1999, Workshop on Intrusion Detection and Network Monitoring.

[15]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[16]  Joseph L. Hellerstein,et al.  Mining Event Data for Actionable Patterns , 2000, Int. CMG Conference.

[17]  Stefan Kätker,et al.  Fault Isolation and Event Correlation for Integrated Fault Management , 1997, Integrated Network Management.

[18]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[19]  Sheng Ma,et al.  Intelligent probing: A cost-effective approach to fault diagnosis in computer networks , 2002, IBM Syst. J..

[20]  Peter Fröhlich,et al.  Using Neural Networks for Alarm Correlation in Cellular Phone Networks , 1999 .

[21]  Jaesung Choi,et al.  An alarm correlation and fault identification scheme based on OSI managed object classes , 1999, 1999 IEEE International Conference on Communications (Cat. No. 99CH36311).

[22]  Donna N. Dillenberger,et al.  Adaptive Algorithms for Managing a Distributed Data Processing Workload , 1997, IBM Syst. J..

[23]  Sheng Ma,et al.  Strategies for Problem Determination using Probing , 2002 .

[24]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[25]  Joseph L. Hellerstein,et al.  Event relationship networks: a framework for action oriented analysis in event management , 2001, 2001 IEEE/IFIP International Symposium on Integrated Network Management Proceedings. Integrated Network Management VII. Integrated Management Strategies for the New Millennium (Cat. No.01EX470).

[26]  Alexander Keller,et al.  Managing application services over service provider networks: architecture and dependency analysis , 2000, NOMS 2000. 2000 IEEE/IFIP Network Operations and Management Symposium 'The Networked Planet: Management Beyond 2000' (Cat. No.00CB37074).

[27]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[28]  Ran Raz,et al.  A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP , 1997, STOC '97.

[29]  Adarshpal S. Sethi,et al.  Multi-layer Fault Localization Using Probabilistic Inference in Bipartite Dependency Graphs , 2001 .

[30]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[31]  Sheng Ma,et al.  Real-time problem determination in distributed systems using active probing , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[32]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .