Building an Ensemble for Software Defect Prediction Based on Diversity Selection

Background: Ensemble techniques have gained attention in various scientific fields. Defect prediction researchers have investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single classifier techniques. Almost all previous work using ensemble techniques in defect prediction rely on the majority voting scheme for combining prediction outputs, and on the implicit diversity among single classifiers. Aim: Investigate whether defect prediction can be improved using an explicit diversity technique with stacking ensemble, given the fact that different classifiers identify different sets of defects. Method: We used classifiers from four different families and the weighted accuracy diversity (WAD) technique to exploit diversity amongst classifiers. To combine individual predictions, we used the stacking ensemble technique. We used state-of-the-art knowledge in software defect prediction to build our ensemble models, and tested their prediction abilities against 8 publicly available data sets. Conclusion: The results show performance improvement using stacking ensembles compared to other defect prediction models. Diversity amongst classifiers used for building ensembles is essential to achieving these performance improvements.

[1]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[2]  Lior Rokach,et al.  Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography , 2009, Comput. Stat. Data Anal..

[3]  ZhangHongyu,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007 .

[4]  Tihana Galinac Grbac,et al.  Software structure evolution and relation to system defectiveness , 2014, EASE '14.

[5]  Bruce Christianson,et al.  Further thoughts on precision , 2011, EASE.

[6]  Mohammad Alshayeb,et al.  Software defect prediction using ensemble learning on selected features , 2015, Inf. Softw. Technol..

[7]  Qinbao Song,et al.  Using Coding-Based Ensemble Learning to Improve Software Defect Prediction , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  SongQinbao,et al.  A General Software Defect-Proneness Prediction Framework , 2011 .

[9]  Romi Satria Wahono,et al.  A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks , 2015 .

[10]  Xin Yao,et al.  journal homepage: www.elsevier.com/locate/infsof Ensembles and locality: Insight on improving software effort estimation , 2022 .

[11]  Bruce Christianson,et al.  The jinx on the NASA software defect data sets , 2016, EASE.

[12]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[13]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.

[14]  Burak Turhan,et al.  Implications of ceiling effects in defect predictors , 2008, PROMISE '08.

[15]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[16]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[17]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[18]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[19]  Xiuzhen Zhang,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007, IEEE Trans. Software Eng..

[20]  Qinbao Song,et al.  Data Quality: Some Comments on the NASA Software Defect Datasets , 2013, IEEE Transactions on Software Engineering.

[21]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[22]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[23]  David Philip Harry Gray,et al.  Software defect prediction using static code metrics : formulating a methodology , 2013 .

[24]  Ayse Basar Bener,et al.  An industrial case study of classifier ensembles for locating software defects , 2011, Software Quality Journal.

[25]  Taghi M. Khoshgoftaar,et al.  Improving Software-Quality Predictions With Data Sampling and Boosting , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[26]  Thomas Shippey,et al.  Exploiting abstract syntax trees to locate software defects , 2015 .

[27]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[28]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[29]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[30]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[31]  Tracy Hall,et al.  Different Classifiers Find Different Defects Although With Different Level of Consistency , 2015, PROMISE.

[32]  Xiaodong Zeng,et al.  Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure , 2014, TheScientificWorldJournal.

[33]  Ian Witten,et al.  Data Mining , 2000 .

[34]  M. Shepperd,et al.  A critique of cyclomatic complexity as a software metric , 1988, Softw. Eng. J..

[35]  Andrea De Lucia,et al.  Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[36]  Ayse Basar Bener,et al.  Ensemble of neural networks with associative memory (ENNA) for estimating software development costs , 2009, Knowl. Based Syst..

[37]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .