An integrated framework on mining logs files for computing system management

Traditional approaches to system management have been largely based on domain experts through a knowledge acquisition process that translates domain knowledge into operating rules and policies. This has been well known and experienced as a cumbersome, labor intensive, and error prone process. In addition, this process is difficult to keep up with the rapidly changing environments. In this paper, we will describe our research efforts on establishing an integrated framework for mining system log files for automatic management. In particular, we apply text mining techniques to categorize messages in log files into common situations, improve categorization accuracy by considering the temporal characteristics of log messages, develop temporal mining techniques to discover the relationships between different events, and utilize visualization tools to evaluate and validate the interesting temporal patterns for system management.

[1]  Gautam Biswas,et al.  Temporal Pattern Generation Using Hidden Markov Model Based Unsupervised Classification , 1999, IDA.

[2]  M. Berman Testing for spatial association between a point process and another stochastic process , 1986 .

[3]  Wei Peng,et al.  Mining Logs Files for Computing System Management , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[4]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[5]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[6]  Joseph L. Hellerstein,et al.  Discovering Fully Dependent Patterns , 2002, SDM.

[7]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Jakub Zavrel,et al.  Information Extraction by Text Classification: Corpus Mining for Features , 2000 .

[9]  Joseph L. Hellerstein,et al.  Discovering actionable patterns in event data , 2002, IBM Syst. J..

[10]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[11]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[12]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[13]  Seraphin B. Calo,et al.  Towards a practical alarm correlation system , 1995, Integrated Network Management.

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  D. Stoyan,et al.  Stochastic Geometry and Its Applications , 1989 .

[16]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[17]  Tim Leek,et al.  Information Extraction Using Hidden Markov Models , 1997 .

[18]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[19]  Joseph L. Hellerstein,et al.  EventBrowser: A Flexible Tool for Scalable Analysis of Event Data , 1999, DSOM.

[20]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[21]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[22]  Tao Li,et al.  Mining temporal patterns without predefined time windows , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[23]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .