Hard Disk Failure Prediction via Transfer Learning

Due to the large-scale growth of data, the storage scale of data centers is getting larger and larger. Hard disk is the main storage medium, once a failure occurs, it will bring huge losses to users and enterprises. In order to improve the reliability of storage systems, many machine learning methods have been widely employed to predict hard disk failure in the past few decades. However, due to the large number of different models of hard disks in the heterogeneous disk system, traditional machine learning methods cannot build a general model. Inspired by a DANN based unsupervised domain adaptation approach for image classification, in this paper, we propose the DFPTL (Disk Failure Prediction via Transfer Learning) approach, which introduce the DANN approach to predict failure in heterogeneous disk systems by reducing the distribution differences between different models of disk datasets. This approach only needs unlabeled data (the target domain) of a specific disk model and the labeled data (the source domain) collected from a different disk model from the same manufacturer. Experimental results on real-world datasets demonstrate that DFPTL can achieve adaptation effect in the presence of domain shifts and outperform traditional machine learning algorithms.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Joseph F. Murray,et al.  Hard drive failure prediction using non-parametric statistical methods , 2003 .

[3]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[5]  Bruce Allen,et al.  Monitoring hard disks with smart , 2004 .

[6]  Marco Loog,et al.  Active learning using uncertainty information , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[7]  Jasmina Bogojeska,et al.  Predicting Disk Replacement towards Reliable Data Centers , 2016, KDD.

[8]  Xubin He,et al.  Failure Prediction Models for Proactive Fault Tolerance within Storage Systems , 2008, 2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems.

[9]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[10]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[11]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[12]  Tie-Yan Liu,et al.  Health Status Assessment and Failure Prediction for Hard Drives with Recurrent Neural Networks , 2016, IEEE Transactions on Computers.

[13]  Joseph F. Murray,et al.  Improved disk-drive failure warnings , 2002, IEEE Trans. Reliab..

[14]  Daniel S. Kermany,et al.  Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning , 2018, Cell.

[15]  Ke Zhou,et al.  Lifelong Disk Failure Prediction via GAN-Based Anomaly Detection , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).

[16]  Wenjun Yang,et al.  Hard Drive Failure Prediction Using Big Data , 2015, 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW).

[17]  Peng Li,et al.  Improving Service Availability of Cloud Systems by Predicting Disk Error , 2018, USENIX ATC.

[18]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[19]  Hai Jin,et al.  Disk Failure Prediction in Data Centers via Online Learning , 2018, ICPP.

[20]  Weimin Zheng,et al.  Predicting Disk Failures with HMM- and HSMM-Based Approaches , 2010, ICDM.

[21]  Arkady Kanevsky,et al.  Are disks the dominant contributor for storage failures?: A comprehensive study of storage subsystem failure characteristics , 2008, TOS.

[22]  Gang Wang,et al.  Proactive drive failure prediction for large scale storage systems , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[23]  Javam C. Machado,et al.  Predicting Failures in Hard Drives with LSTM Networks , 2017, 2017 Brazilian Conference on Intelligent Systems (BRACIS).

[24]  Volkan Cevher,et al.  A new regret analysis for Adam-type algorithms , 2020, ICML.

[25]  Benno Stein,et al.  Cross-Language Text Classification Using Structural Correspondence Learning , 2010, ACL.

[26]  Tommy W. S. Chow,et al.  A Two-Step Parametric Method for Failure Prediction in Hard Disk Drives , 2014, IEEE Transactions on Industrial Informatics.

[27]  Ke Zhou,et al.  Transfer Learning based Failure Prediction for Minority Disks in Large Data Centers of Heterogeneous Disk Systems , 2019, ICPP.

[28]  Xu Zhang,et al.  Cross-dataset Time Series Anomaly Detection for Cloud Systems , 2019, USENIX Annual Technical Conference.

[29]  Qing He,et al.  Multi-representation adaptation network for cross-domain image classification , 2019, Neural Networks.

[30]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[31]  Yiqiang Chen,et al.  Deep Transfer Learning for Cross-domain Activity Recognition , 2018, ICCSE'18.

[32]  Javam C. Machado,et al.  Transfer Learning for Bayesian Networks with Application on Hard Disk Drives Failure Prediction , 2017, 2017 Brazilian Conference on Intelligent Systems (BRACIS).

[33]  Gang Wang,et al.  Hard Drive Failure Prediction Using Classification and Regression Trees , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.