Log filtering and interpretation for root cause analysis

Problem diagnosis in large software systems is a challenging and complex task. The sheer complexity and size of the logged data make it often difficult for human operators and administrators to perform problem diagnosis and root cause analysis. A challenge in this area is to provide the necessary means, tools, and techniques for the operators to focus their attention to specific parts of the logged data reducing thus the complexity of the diagnostic process. In this paper, we propose a framework for filtering logs according to specific analysis goals and diagnostic hypotheses set by the user or by an automated process. More specifically, the proposed framework uses annotated goal trees to model the constraints and the conditions by which the functionality of a particular system is being delivered. Next, a transformation process maps such constraints and conditions to a collection of queries that can be either applied to a relational database that stores the logged data or use Latent Semantic Indexing to identify the most relevant log entries for the given query. The results of such queries provide a subset of the logged data that is compliant with the goal tree and can be used by a diagnostic SAT-solver based algorithm. Experimental results show that the filtering process can reduce the time and complexity of the diagnosis when applied to multitier heterogeneous service oriented systems.

[1]  Ian J. Davis,et al.  DRACA: decision support for root cause analysis and change impact analysis for CMDBs , 2009, CASCON.

[2]  Yann-Gaël Guéhéneuc,et al.  Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[3]  Michael I. Jordan,et al.  Failure diagnosis using decision trees , 2004 .

[4]  Gilbert Hamann,et al.  Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper) , 2008, 2008 The Eighth International Conference on Quality Software.

[5]  Malgorzata Steinder,et al.  Probabilistic fault diagnosis in communication systems through incremental hypothesis updating , 2004, Comput. Networks.

[6]  Lihua Wu,et al.  A Personalized Intelligent Web Retrieval System Based on the Knowledge-Base Concept and Latent Semantic Indexing Model , 2009, 2009 Seventh ACIS International Conference on Software Engineering Research, Management and Applications.

[7]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[8]  Abdelwahab Hamou-Lhadj,et al.  Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[9]  Yijun Yu,et al.  Reverse engineering goal models from legacy code , 2005, 13th IEEE International Conference on Requirements Engineering (RE'05).

[10]  Wei-Ying Ma,et al.  Automated known problem diagnosis with event traces , 2006, EuroSys.

[11]  Luciano Baresi,et al.  Smart monitors for composed services , 2004, ICSOC '04.

[12]  Andreas Hanemann A hybrid rule-based/case-based reasoning approach for service fault diagnosis , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[13]  Maitreya Natu,et al.  Using temporal correlation for fault localization in dynamically changing networks , 2008, Int. J. Netw. Manag..

[14]  John Mylopoulos,et al.  Reasoning with Goal Models , 2002, ER.

[15]  Yijun Yu,et al.  Monitoring and diagnosing software requirements , 2009, Automated Software Engineering.