ZTE-Predictor: Disk Failure Prediction System Based on LSTM

Disk failure prediction technology has become a hotspot in both academia and industry, which is of great significance to improve the reliability of data center. This paper studies ZTE's disk SMART (Self-Monitoring Analysis and Reporting Technology) data set, trying to predict whether the disk will fail within 5-7 days. In the model training stage, the disk state is classified as normal and failure within 5 days. Then the positive and negative samples are balanced by both over-sampling and under-sampling. Finally, the data set is trained by LSTM (Long Short-Term Memory) and the disk failure prediction model is obtained. In the experiment of ZTE historical data set, the best FDR (Fault Detection Rate) is 97.4% and FAR (False Alarm Rate) is 0.3%. After launching in ZTE data center for 7 months, the best FDR is 94.5%, and the FAR is 0.7%.

[1]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[2]  Donghai Guan,et al.  Nearest neighbor editing aided by unlabeled data , 2009, Inf. Sci..

[3]  Jasmina Bogojeska,et al.  Predicting Disk Replacement towards Reliable Data Centers , 2016, KDD.

[4]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[5]  Tie-Yan Liu,et al.  Health Status Assessment and Failure Prediction for Hard Drives with Recurrent Neural Networks , 2016, IEEE Transactions on Computers.

[6]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  Gang Wang,et al.  A combined Bayesian network method for predicting drive failure times from SMART attributes , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).