LogTree: A Framework for Generating System Events from Raw Textual Logs

Modern computing systems are instrumented to generate huge amounts of system logs and these data can be utilized for understanding and complex system behaviors. One main fundamental challenge in automated log analysis is the generation of system events from raw textual logs. Recent works apply clustering techniques to translate the raw log messages into system events using only the word/term information. In this paper, we first illustrate the drawbacks of existing techniques for event generation from system logs. We then propose Log Tree, a novel and algorithm-independent framework for events generation from raw system log messages. Log Tree utilizes the format and structural information of the raw logs in the clustering process to generate system events with better accuracy. In addition, an indexing data structure, Message Segment Table, is proposed in Log Tree to significantly improve the efficiency of events creation. Extensive experiments on real system logs demonstrate the effectiveness and efficiency of Log Tree.

[1]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[2]  Joseph L. Hellerstein,et al.  Discovering actionable patterns in event data , 2002, IBM Syst. J..

[3]  Wei Peng,et al.  Event summarization for system management , 2007, KDD '07.

[4]  David Walker,et al.  Incremental learning of system log formats , 2010, OPSR.

[5]  Wei Peng,et al.  A Clustering Model Based on Matrix Approximation with Applications to Cluster System Log Files , 2005, ECML.

[6]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[7]  Wei Peng,et al.  An Integrated Data-Driven Framework for Computing System Management , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Michal Aharon,et al.  One Graph Is Worth a Thousand Logs: Uncovering Hidden Structures in Massive System Event Logs , 2009, ECML/PKDD.

[10]  Evangelos E. Milios,et al.  Clustering event logs using iterative partitioning , 2009, KDD.

[11]  Ling Huang,et al.  Mining Console Logs for Large-Scale System Problem Detection , 2008, SysML.

[12]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[13]  Timos K. Sellis,et al.  Clustering XML Documents Using Structural Summaries , 2004, EDBT Workshops.

[14]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[15]  Wei Peng,et al.  An integrated framework on mining logs files for computing system management , 2005, KDD '05.

[16]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[17]  Jiawei Han,et al.  Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[18]  Doubletree Hotel San Jose,et al.  The World's Most Popular Open Source Database , 2003 .

[19]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[20]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..