Mining Logs Files for Computing System Management

With advancement in science and technology, computing systems become increasingly more difficult to monitor, manage and maintain. Traditional approaches to system management have been largely based on domain experts through a knowledge acquisition process to translate domain knowledge into operating rules and policies. This has been experienced as a cumbersome, labor intensive, and error prone process. There is thus a pressing need for automatic and efficient approaches to monitor and manage complex computing systems. A popular approach to system management is based on analyzing system log files. However, several new aspects of the system log data have been less emphasized in existing analysis methods and posed several challenges. The aspects include disparate formats and relatively short text messages in data reporting, asynchronous data collection, and temporal characteristics in data representation. First, a typical computing system contains different devices with different software components, possibly from different providers. These various components have multiple ways to report events, conditions, errors and alerts. The heterogeneity and inconsistency of log formats make it difficult to automate problem determination. To perform automated analysis, we need to categorize the text messages with disparate formats into common situations. Second, text messages in the log files are relatively short with a large vocabulary size. Third, each text message usually contains a timestamp. The temporal characteristics provide additional context information of the messages and can be used to facilitate data analysis. In this paper, we apply text mining to automatically categorize the messages into a set of common categories, and propose two approaches of incorporating temporal information to improve the categorization performance

[1]  Gautam Biswas,et al.  Temporal Pattern Generation Using Hidden Markov Model Based Unsupervised Classification , 1999, IDA.

[2]  Ke Wang,et al.  Building Hierarchical Classifiers Using Class Proximity , 1999, VLDB.

[3]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[4]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[5]  Joseph L. Hellerstein,et al.  Discovering actionable patterns in event data , 2002, IBM Syst. J..

[6]  Tim Leek,et al.  Information Extraction Using Hidden Markov Models , 1997 .

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  C. Collin,et al.  An Introduction to Natural Computation , 1998, Trends in Cognitive Sciences.

[11]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[12]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[13]  Steven F. Roth,et al.  An Interactive Visualization Environment for Data Exploration , 1997, KDD.

[14]  Tao Li,et al.  Document clustering via adaptive subspace iteration , 2004, SIGIR '04.

[15]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[16]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[17]  Jade Goldstein-Stewart,et al.  A Framework for Knowledge-based Interactive Data Exploration , 1994, J. Vis. Lang. Comput..

[18]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[19]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[20]  Jakub Zavrel,et al.  Information Extraction by Text Classification: Corpus Mining for Features , 2000 .

[21]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[22]  John Stearley,et al.  Towards informatic analysis of syslogs , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[23]  Matthew O. Ward,et al.  XmdvTool: integrating multiple methods for visualizing multivariate data , 1994, Proceedings Visualization '94.

[24]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[25]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[26]  Joseph L. Hellerstein,et al.  EventBrowser: A Flexible Tool for Scalable Analysis of Event Data , 1999, DSOM.

[27]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[28]  Sheng Ma,et al.  PICCIL: Interactive Learning to Support Log File Categorization , 2005, Second International Conference on Autonomic Computing (ICAC'05).