Improving Failure Prediction by Ensembling the Decisions of Machine Learning Models: A Case Study

The complexity of software has grown considerably in recent years, making it nearly impossible to detect all faults before pushing to production. Such faults can ultimately lead to failures at runtime. Recent works have shown that using Machine Learning (ML) algorithms it is possible to create models that can accurately predict such failures. At the same time, methods that combine several independent learners (i.e., ensembles) have proved to outperform individual models in various problems. While some well-known ensemble algorithms (e.g Bagging) use the same base learners (i.e., homogeneous), using different algorithms (i.e., heterogeneous) may exploit the different biases of each algorithm. However, this is not a trivial task, as it requires finding and choosing the most adequate base learners and methods to combine their outputs. This paper presents a case study on using several ML techniques to create heterogeneous ensembles for Online Failure Prediction (OFP). More precisely, it attempts to assess the viability of combining different learners to improve performance and to understand how different combination techniques influence the results. The paper also explores whether the interactions between learners can be studied and leveraged. The results suggest that the combination of certain learners and techniques, not necessarily individually the best, can improve the overall ability to predict failures. Additionally, studying the synergies in the best ensembles provides interesting insights into why some are able to perform better.

[1]  Mamun Bin Ibne Reaz,et al.  A survey of intrusion detection systems based on ensemble and hybrid classifiers , 2017, Comput. Secur..

[2]  Kaushal Chari,et al.  An ensemble-based model for predicting agile software development effort , 2018, Empirical Software Engineering.

[3]  Thanh Tung Khuat,et al.  Ensemble learning for software fault prediction problem with imbalanced data , 2019 .

[4]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[5]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[6]  Mostafa Ghobaei-Arani,et al.  A learning automata-based ensemble resource usage prediction algorithm for cloud computing environment , 2018, Future Gener. Comput. Syst..

[7]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[8]  Joseph F. Murray,et al.  Improved disk-drive failure warnings , 2002, IEEE Trans. Reliab..

[9]  Miroslaw Malek,et al.  Using Hidden Semi-Markov Models for Effective Online Failure Prediction , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[10]  J. Skilling Bayesian Methods in Cosmology: Foundations and algorithms , 2009 .

[11]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[12]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[14]  B. Dhanalaxmi,et al.  A Review on Software Fault Detection and Prevention Mechanism in Software Development Activities , 2015 .

[15]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[16]  Marco Vieira,et al.  Assessing the Impact of Virtualization on the Generation of Failure Prediction Data , 2013, 2013 Sixth Latin-American Symposium on Dependable Computing.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[19]  Anne M. P. Canuto,et al.  Combining Multiple Algorithms in Classifier Ensembles using Generalized Mixture Functions , 2018, Neurocomputing.

[20]  Araceli Sanchis,et al.  Generating ensembles of heterogeneous classifiers using Stacked Generalization , 2015, WIREs Data Mining Knowl. Discov..

[21]  Ethem Alpaydin,et al.  Introduction to Machine Learning (Adaptive Computation and Machine Learning) , 2004 .

[22]  Ernesto Costa,et al.  Propheticus: Machine Learning Framework for the Development of Predictive Models for Reliable and Secure Software , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[23]  Miroslaw Malek,et al.  Proactive fault handling for system availability enhancement , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[24]  Andy P. Field,et al.  Discovering Statistics Using Ibm Spss Statistics , 2017 .

[25]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[26]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[27]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[28]  Ernesto Costa,et al.  Exploratory Study of Machine Learning Techniques for Supporting Failure Prediction , 2018, 2018 14th European Dependable Computing Conference (EDCC).

[29]  Dhruba Kumar Bhattacharyya,et al.  Classification of microarray cancer data using ensemble approach , 2013, Network Modeling Analysis in Health Informatics and Bioinformatics.

[30]  Mostafa Ghobaei-Arani,et al.  An ensemble CPU load prediction algorithm using a Bayesian information criterion and smooth filters in a cloud computing environment , 2018, Softw. Pract. Exp..

[31]  Marco Vieira,et al.  Towards Identifying the Best Variables for Failure Prediction Using Injection of Realistic Software Faults , 2010, 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing.

[32]  Akbar Siami Namin,et al.  Early Identification of Vulnerable Software Components via Ensemble Learning , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).