Cross-Project Defect Prediction (CPDP) as a means to focus quality assurance of software projects was under heavy investigation in recent years. However, within the current state-of-the-art it is unclear which of the many proposals performs best due to a lack of replication of results and diverse experiment setups that utilize different performance metrics and are based on different underlying data. Within this article, we provide a benchmark for CPDP. We replicate 24 approaches proposed by researchers between 2008 and 2015 and evaluate their performance on software products from five different data sets. Based on our benchmark, we determined that an approach proposed by Camargo Cruz and Ochimizu (2009) based on data standardization performs best and is always ranked among the statistically significant best results for all metrics and data sets. Approaches proposed by Turhan et al. (2009), Menzies et al. (2011), and Watanabe et al. (2008) are also nearly always among the best results. Moreover, we determined that predictions only seldom achieve a high performance of 0.75 recall, precision, and accuracy. Thus, CPDP still has not reached a point where the performance of the results is sufficient for the application in practice.
[1]
Jens Grabowski,et al.
Correction of "A Comparative Study to Benchmark Cross-project Defect Prediction Approaches"
,
2019,
IEEE Trans. Software Eng..
[2]
Jens Grabowski,et al.
A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches
,
2018,
IEEE Transactions on Software Engineering.
[3]
Koichiro Ochimizu,et al.
Towards logistic regression models for predicting fault-prone code across software projects
,
2009,
2009 3rd International Symposium on Empirical Software Engineering and Measurement.
[4]
Harald C. Gall,et al.
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
,
2009,
ESEC/SIGSOFT FSE.