Proactive drive failure prediction for large scale storage systems

Most of the modern hard disk drives support Self-Monitoring, Analysis and Reporting Technology (SMART), which can monitor internal attributes of individual drives and predict impending drive failures by a thresholding method. As the prediction performance of the thresholding algorithm is disappointing, some researchers explored various statistical and machine learning methods for predicting drive failures based on SMART attributes. However, the failure detection rates of these methods are only up to 50% ~ 60% with low false alarm rates (FARs). We explore the ability of Backpropagation (BP) neural network model to predict drive failures based on SMART attributes. We also develop an improved Support Vector Machine (SVM) model. A real-world dataset concerning 23,395 drives is used to verify these models. Experimental results show that the prediction accuracy of both models is far higher than previous works. Although the SVM model achieves the lowest FAR (0.03%), the BP neural network model is considerably better in failure detection rate which is up to 95% while keeping a reasonable low FAR.

[1]  Joseph F. Murray,et al.  Hard drive failure prediction using non-parametric statistical methods , 2003 .

[2]  Joseph F. Murray,et al.  Improved disk-drive failure warnings , 2002, IEEE Trans. Reliab..

[3]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[4]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[5]  David A. Patterson,et al.  Designing Disk Arrays for High Data Reliability , 1993, J. Parallel Distributed Comput..

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Xubin He,et al.  Failure Prediction Models for Proactive Fault Tolerance within Storage Systems , 2008, 2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems.

[9]  Bruce Allen,et al.  Monitoring hard disks with smart , 2004 .

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[12]  Weimin Zheng,et al.  Predicting Disk Failures with HMM- and HSMM-Based Approaches , 2010, ICDM.