Using Syslog Message Sequences for Predicting Disk Failures

Mitigating the impact of computer failure is possible if accurate failure predictions are provided. Resources, and services can be scheduled around predicted failure and limit the impact. Such strategies are especially important for multi-computer systems, such as compute clusters, that experience a higher rate of failure due to the large number of components. However providing accurate predictions with sufficient lead time remains a challenging problem. This research uses a new spectrum-kernel Support Vector Machine (SVM) approach to predict failure events based on system log files. These files contain messages that represent a change of system state. While a single message in the file may not be sufficient for predicting failure, a sequence or pattern of messages may be. This approach uses a sliding window (sub-sequence) of messages to predict the likelihood of failure. Then, a frequency representation of the message sub-sequences observed are used as input to the SVM. The SVM associates the messages to a class of failed or non-failed system. Experimental results using actual system log files from a Linux-based compute cluster indicate the proposed spectrum-kernel SVM approach can predict hard disk failure with an accuracy of 80% about one day in advance.

[1]  Joseph F. Murray,et al.  Hard drive failure prediction using non-parametric statistical methods , 2003 .

[2]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[3]  Andrew V. Karode SUPPORT VECTOR MACHINE CLASSIFICATION OF NETWORK STREAMS USING A SPECTRUM KERNEL ENCODING , 2008 .

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  A. Hughes Oxford English Dictionary. , 2008, Isis; an international review devoted to the history of science and its cultural influences.

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Hui Xiong,et al.  Failure Prediction in IBM BlueGene/L Event Logs , 2007, ICDM.

[8]  Bruce Allen,et al.  Monitoring hard disks with smart , 2004 .

[9]  Errin W. Fulp,et al.  In-the-Dark Network Traffic Classification Using Support Vector Machines , 2008, AAAI.

[10]  Peter M. Broadwell Component Failure Prediction Using Supervised Naive Bayes Classication , 2002 .

[11]  Glenn A. Fink,et al.  Predicting Computer System Failures Using Support Vector Machines , 2008, WASL.

[12]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[13]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[14]  Douglas G. Turnbull Failure Prediction in Hardware Systems , 2022 .

[15]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[16]  Bianca Schroeder,et al.  Understanding failures in petascale computers , 2007 .

[17]  Chris Lonvick,et al.  The BSD Syslog Protocol , 2001, RFC.