A Two-Step Parametric Method for Failure Prediction in Hard Disk Drives

Predicting the impending failure of hard disk drives (HDDs) is crucial for preventing essential data from losing. In this paper, a two-step parametric method was developed to predict the impending failure of HDDs using the aggregate of statistical models. This method deals with the problem of failure prediction in two steps: anomaly detection and failure prediction. First, Mahalanobis distance was used for aggregating all the monitored variables into one index, which was then transformed into Gaussian variables by Box-Cox transformation. By defining an appropriate threshold, anomalies in HDDs were detected as a result. Second, a sliding-window-based generalized likelihood ratio test was proposed to track the anomaly progression in an HDD. When the occurrence of anomalies in a time interval is found to be statistically significant, indicating the HDD is approaching failure. In this work, we also derived a new cost function to adjust the prediction rate. This is important in a way to balance the failure detection rate and false alarm rate as well as to provide an advanced warning of HDD failures to the users, whereby the users can back up their data in time. Then the developed method was applied on a synthetic data set showing its effectiveness on predicting failures. To demonstrate the practical usefulness, this method was also applied on a real-life HDD data set. The result shows that our method could achieve 68% failure detection rate with 0% false alarm rate. This is much better than the results achieved by the state-of-the-art methods, such as support vector machine and hidden Markov models.

[1]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[2]  Rajesh Jugulum,et al.  The Mahalanobis-Taguchi strategy : a pattern technology system , 2002 .

[3]  Tommy W. S. Chow,et al.  Approach to Fault Identification for Electronic Products Using Mahalanobis Distance , 2010, IEEE Transactions on Instrumentation and Measurement.

[4]  Qiang Miao,et al.  Health monitoring of hard disk drive based on Mahalanobis distance , 2011, 2011 Prognostics and System Health Managment Confernece.

[5]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[6]  Weimin Zheng,et al.  Predicting Disk Failures with HMM- and HSMM-Based Approaches , 2010, ICDM.

[7]  Pradeep Lall,et al.  Extended Kalman Filter models and resistance spectroscopy for prognostication and health monitoring of leadfree electronics under vibration , 2011, 2011 IEEE Conference on Prognostics and Health Management.

[8]  David Shan-Hill Wong,et al.  Fault Detection Based on Statistical Multivariate Analysis and Microarray Visualization , 2010, IEEE Transactions on Industrial Informatics.

[9]  Michael Pecht,et al.  A fusion approach for anomaly detection in hard disk drives , 2012, Proceedings of the IEEE 2012 Prognostics and System Health Management Conference (PHM-2012 Beijing).

[10]  Joseph F. Murray,et al.  Hard drive failure prediction using non-parametric statistical methods , 2003 .

[11]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[12]  Robert Karns Henry,et al.  Monitoring PC Hardware Sounds in Linux Systems Using the Daubechies D4 Wavelet. , 2005 .

[13]  Martin Hilbert,et al.  The World’s Technological Capacity to Store, Communicate, and Compute Information , 2011, Science.

[14]  Joseph Naus,et al.  Temporal surveillance using scan statistics , 2006, Statistics in medicine.

[15]  Nagarajan Murali,et al.  Early Classification of Bearing Faults Using Morphological Operators and Fuzzy Inference , 2013, IEEE Transactions on Industrial Electronics.

[16]  Abe Zeid,et al.  Assessement of current health of hard disk drives , 2009, 2009 IEEE International Conference on Automation Science and Engineering.

[17]  E Peizer,et al.  Technical aids. , 1978, Prosthetics and orthotics international.

[18]  Wei-Yang Lin,et al.  Machine Learning in Financial Crisis Prediction: A Survey , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  P. Lall,et al.  Prognostics and health management of electronics , 2006, 2006 11th International Symposium on Advanced Packaging Materials: Processes, Properties and Interface.

[20]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[21]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[22]  Giansalvo Cirrincione,et al.  Bearing Fault Detection by a Novel Condition-Monitoring Scheme Based on Statistical-Time Features and Neural Networks , 2013, IEEE Transactions on Industrial Electronics.

[23]  Michael G. Pecht,et al.  Health Monitoring of Cooling Fans Based on Mahalanobis Distance With mRMR Feature Selection , 2012, IEEE Transactions on Instrumentation and Measurement.

[24]  Donghua Zhou,et al.  Remaining useful life estimation - A review on the statistical data driven approaches , 2011, Eur. J. Oper. Res..

[25]  Abe Zeid,et al.  Assessment of Current Health of Hard Disk Drives , 2010 .

[26]  Wei He,et al.  Bearing fault detection based on optimal wavelet filter and sparse code shrinkage , 2009 .

[27]  Michael Osterman,et al.  Prognostics of lithium-ion batteries based on DempsterShafer theory and the Bayesian Monte Carlo me , 2011 .

[28]  Eric Horvitz,et al.  Considering Cost Asymmetry in Learning Classifiers , 2006, J. Mach. Learn. Res..

[29]  Joseph F. Murray,et al.  Improved disk-drive failure warnings , 2002, IEEE Trans. Reliab..

[30]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[31]  Bianca Schroeder,et al.  Understanding disk failure rates: What does an MTTF of 1,000,000 hours mean to you? , 2007, TOS.

[32]  Eduardo Pinheiro,et al.  Failure Trends in a Large Disk Drive Population , 2007, FAST.

[33]  Anatoly V Zayats,et al.  Data storage: The third plasmonic revolution. , 2010, Nature nanotechnology.

[34]  Wenbin Wang,et al.  A stochastic filtering based data driven approach for residual life prediction and condition based maintenance decision making support , 2010, 2010 Prognostics and System Health Management Conference.

[35]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[36]  Xiaohui Gu,et al.  On Predictability of System Anomalies in Real World , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[37]  Arkady Kanevsky,et al.  Are disks the dominant contributor for storage failures?: A comprehensive study of storage subsystem failure characteristics , 2008, TOS.

[38]  Miroslaw Malek,et al.  A survey of online failure prediction methods , 2010, CSUR.