Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models

Sampling methods are known to impact defect prediction performance. These sampling methods have configurable parameters that can significantly affect the prediction performance. It is however, impractical to assess the effect of all the possible different settings in the parameter space for all the several existing sampling methods. A constant and easy to tweak parameter present in all sampling methods is the distribution of the defective and non-defective modules in the dataset known as Pfp (% of fault-prone modules). In this paper, we investigate and assess the performance of defect prediction models where the Pfp parameter of sampling methods are tweaked. An empirical experiment and assessment of seven sampling methods on five prediction models over 20 releases of 10 static metric projects indicate that (1) Area Under the Receiver Operating Characteristics Curve (AUC) performance is not improved after tweaking the Pfp parameter, (2) pf (false alarms) performance degrades as the Pfp is increased. (3) a stable predictor is difficult to achieve across different Pfp rates. Hence, we conclude that the Pfp parameter setting can have a large impact on the performance (except AUC) of defect prediction models. We thus recommend researchers experiment with the Pfp parameter of the sampling method since the distribution of training datasets vary.

[1]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[2]  Akito Monden,et al.  MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction , 2018, IEEE Trans. Software Eng..

[3]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[4]  Akito Monden,et al.  Investigating the Effects of Balanced Training and Testing Datasets on Effort-Aware Fault Prediction Models , 2016, 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC).

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Tim Menzies,et al.  "Better Data" is Better than "Better Data Miners" (Benefits of Tuning SMOTE for Defect Prediction) , 2017, ICSE.

[7]  Akito Monden,et al.  Empirical Evaluation of Cross-Release Effort-Aware Defect Prediction Models , 2016, 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[8]  Bruce Christianson,et al.  The misuse of the NASA metrics data program data sets for automated software defect prediction , 2011, EASE.

[9]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[10]  Tracy Hall,et al.  Software defect prediction: do different classifiers find the same defects? , 2017, Software Quality Journal.

[11]  J C Riquelme,et al.  Finding Defective Modules from Highly Unbalanced Datasets , 2008 .

[12]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[13]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[14]  Marian Jureczko,et al.  Using Object-Oriented Design Metrics to Predict Software Defects 1* , 2010 .

[15]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[16]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[17]  Yan Ma A Statistical Framework for the Prediction of Fault-Proneness , 2019 .

[18]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[19]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[20]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[21]  Akito Monden,et al.  The Effects of Over and Under Sampling on Fault-prone Module Detection , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[22]  Burak Turhan,et al.  Implications of ceiling effects in defect predictors , 2008, PROMISE '08.

[23]  M. Puri,et al.  The multivariate nonparametric Behrens–Fisher problem , 2002 .

[24]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[25]  Pearl Brereton,et al.  Robust Statistical Methods for Empirical Software Engineering , 2017, Empirical Software Engineering.

[26]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[27]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[28]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[29]  Norman E. Fenton,et al.  Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[30]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[31]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.