Predicting Failures in Hard Drives with LSTM Networks

Several research has been done to propose early failure detection techniques for hard disk drives in order to improve storage systems availability and avoid data loss. Failure prediction in such circumstances would allow for the reduction of downtime costs through anticipated disk replacements. Many of the techniques proposed so far mainly perform incipient failure detection thus not allowing for proper planning of such maintenance tasks. Others perform well only under a limited prediction horizon. In this work, we present a remaining useful life estimation approach for hard disk drives based on SMART parameters that is capable of predicting failures in both long and short term intervals by leveraging the capabilities of LSTM networks.

[1]  Tommy W. S. Chow,et al.  A Two-Step Parametric Method for Failure Prediction in Hard Disk Drives , 2014, IEEE Transactions on Industrial Informatics.

[2]  Tie-Yan Liu,et al.  Health Status Assessment and Failure Prediction for Hard Drives with Recurrent Neural Networks , 2016, IEEE Transactions on Computers.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[5]  Benjamin Schrauwen,et al.  Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Javam C. Machado,et al.  BaNHFaP: A Bayesian Network Based Failure Prediction Approach for Hard Disk Drives , 2016, 2016 5th Brazilian Conference on Intelligent Systems (BRACIS).

[10]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[11]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[14]  Javam C. Machado,et al.  A Fault Detection Method for Hard Disk Drives Based on Mixture of Gaussians and Nonparametric Statistics , 2017, IEEE Transactions on Industrial Informatics.

[15]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[16]  Qiang Miao,et al.  Online Anomaly Detection for Hard Disk Drives Based on Mahalanobis Distance , 2013, IEEE Transactions on Reliability.

[17]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[18]  Jasmina Bogojeska,et al.  Predicting Disk Replacement towards Reliable Data Centers , 2016, KDD.

[19]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[20]  Joseph F. Murray,et al.  Improved disk-drive failure warnings , 2002, IEEE Trans. Reliab..

[21]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.