What is the Impact of Imbalance on Software Defect Prediction Performance?

Software defect prediction performance varies over a large range. Menzies suggested there is a ceiling effect of 80% Recall [8]. Most of the data sets used are highly imbalanced. This paper asks, what is the empirical effect of using different datasets with varying levels of imbalance on predictive performance? We use data synthesised by a previous meta-analysis of 600 fault prediction models and their results. Four model evaluation measures (the Mathews Correlation Coefficient (MCC), F-Measure, Precision and Recall) are compared to the corresponding data imbalance ratio. When the data are imbalanced, the predictive performance of software defect prediction studies is low. As the data become more balanced, the predictive performance of prediction models increases, from an average MCC of 0.15, until the minority class makes up 20% of the instances in the dataset, where the MCC reaches an average value of about 0.34. As the proportion of the minority class increases above 20%, the predictive performance does not significantly increase. Using datasets with more than 20% of the instances being defective has not had a significant impact on the predictive performance when using MCC. We conclude that comparing the results of defect prediction studies should take into account the imbalance of the data.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Rok Blagus,et al.  Evaluation of SMOTE for High-Dimensional Class-Imbalanced Microarray Data , 2012, 2012 11th International Conference on Machine Learning and Applications.

[3]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[4]  Gustavo E. A. P. A. Batista,et al.  An Experimental Design to Evaluate Class Imbalance Treatment Methods , 2012, 2012 11th International Conference on Machine Learning and Applications.

[5]  David Philip Harry Gray,et al.  Software defect prediction using static code metrics : formulating a methodology , 2013 .

[6]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[7]  Burak Turhan,et al.  Implications of ceiling effects in defect predictors , 2008, PROMISE '08.

[8]  Daoud Clarke,et al.  On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data , 2012, Decis. Support Syst..

[9]  José Javier Dolado,et al.  Preliminary comparison of techniques for dealing with imbalance in software defect prediction , 2014, EASE '14.

[10]  Per Runeson,et al.  Detection or isolation of defects? An experimental comparison of unit testing and code inspection , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[11]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.