A Lightweight Algorithm for Message Type Extraction in System Application Logs

Message type or message cluster extraction is an important task in the analysis of system logs in computer networks. Defining these message types automatically facilitates the automatic analysis of system logs. When the message types that exist in a log file are represented explicitly, they can form the basis for carrying out other automatic application log analysis tasks. In this paper, we introduce a novel algorithm for carrying out message type extraction from event log files. IPLoM, which stands for Iterative Partitioning Log Mining, works through a 4-step process. The first three steps hierarchically partition the event log into groups of event log messages or event clusters. In its fourth and final stage, IPLoM produces a message type description or line format for each of the message clusters. IPLoM is able to find clusters in data irrespective of the frequency of its instances in the data, it scales gracefully in the case of long message type patterns and produces message type descriptions at a level of abstraction, which is preferred by a human observer. Evaluations show that IPLoM outperforms similar algorithms statistically significantly.

[1]  Mika Klemettinen,et al.  A Knowledge Discovery Methodology for Telecommunication Network Alarm Databases , 1999 .

[2]  A. Nur Zincir-Heywood,et al.  Fast entropy based alert detection in super computer logs , 2010, 2010 International Conference on Dependable Systems and Networks Workshops (DSN-W).

[3]  Evangelos E. Milios,et al.  LogView: Visualizing Event Log Clusters , 2008, 2008 Sixth Annual Conference on Privacy, Security and Trust.

[4]  Miroslaw Malek,et al.  Using Hidden Semi-Markov Models for Effective Online Failure Prediction , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[5]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[6]  GunopulosDimitrios,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998 .

[7]  Risto Vaarandi,et al.  A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs , 2004, INTELLCOMM.

[8]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[9]  Evangelos E. Milios,et al.  Extracting Message Types from BlueGene / L ’ s Logs , 2009 .

[10]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[11]  Wei Peng,et al.  An integrated framework on mining logs files for computing system management , 2005, KDD '05.

[12]  Sheng Ma,et al.  Generic adapter logging toolkit , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[13]  Chris Lonvick,et al.  The BSD Syslog Protocol , 2001, RFC.

[14]  Evangelos E. Milios,et al.  Storage and retrieval of system log events using a structured schema based on message type transformation , 2011, SAC '11.

[15]  Qingguo Zheng,et al.  Intelligent search of correlated alarms from database containing noise data , 2001, NOMS 2002. IEEE/IFIP Network Operations and Management Symposium. ' Management Solutions for the New Communications World'(Cat. No.02CH37327).

[16]  Thomas Reidemeister,et al.  Dependency-aware fault diagnosis with metric-correlation models in enterprise software systems , 2010, 2010 International Conference on Network and Service Management.

[17]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[18]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[19]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[20]  Risto Vaarandi,et al.  A data clustering algorithm for mining patterns from event logs , 2003, Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003) (IEEE Cat. No.03EX764).

[21]  Alexander Aiken,et al.  Alert Detection in System Logs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22]  W. De Pauw,et al.  Web Services Navigator: Visualizing the execution of Web Services , 2005, IBM Syst. J..

[23]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[24]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[25]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[26]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[27]  Wil M. P. van der Aalst,et al.  Process Mining in Web Services: The WebSphere Case , 2008, IEEE Data Eng. Bull..

[28]  M. Tahar Kechadi,et al.  Cufres: clustering using fuzzy representative eventsselection for the fault recognition problem intelecommunication networks , 2007, PIKM '07.