Cross-project defect prediction models: L'Union fait la force

Existing defect prediction models use product or process metrics and machine learning methods to identify defect-prone source code entities. Different classifiers (e.g., linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear winner. In this paper we present an empirical study aiming at statistically analyzing the equivalence of different defect predictors. We also propose a combined approach, coined as CODEP (COmbined DEfect Predictor), that employs the classification provided by different machine learning techniques to improve the detection of defect-prone entities. The study was conducted on 10 open source software systems and in the context of cross-project defect prediction, that represents one of the main challenges in the defect prediction field. The statistical analysis of the results indicates that the investigated classifiers are not equivalent and they can complement each other. This is also confirmed by the superior prediction accuracy achieved by CODEP when compared to stand-alone defect predictors.

[1]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[2]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[3]  Tim Menzies,et al.  Sharing experiments using open‐source software , 2011, Softw. Pract. Exp..

[4]  Valery Buzungu,et al.  Predicting Fault-prone Components in a Java Legacy System , 2006 .

[5]  Tim Menzies,et al.  Better cross company defect prediction , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[6]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[7]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[8]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[9]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[10]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[11]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[12]  J. PaiGanesh,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007 .

[13]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[14]  Nachiappan Nagappan,et al.  Predicting defects with program dependencies , 2009, ESEM 2009.

[15]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[16]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[17]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[18]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[19]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[20]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[21]  William Marsh,et al.  Predicting software defects in varying development lifecycles using Bayesian nets , 2007, Inf. Softw. Technol..

[22]  Victor R. Basili,et al.  A validation of object oriented metrics as quality indicators , 1996 .

[23]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[24]  Koichiro Ochimizu,et al.  Towards logistic regression models for predicting fault-prone code across software projects , 2009, ESEM 2009.

[25]  Martin D. Buhmann,et al.  Radial Basis Functions , 2021, Encyclopedia of Mathematical Geosciences.

[26]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[27]  Silvio Romero de Lemos Meira,et al.  A Constructive RBF Neural Network for Estimating the Probability of Defects in Software Modules , 2007, 2007 International Joint Conference on Neural Networks.

[28]  Olcay Taner Yildiz,et al.  Software defect prediction using Bayesian networks , 2012, Empirical Software Engineering.

[29]  Mahmoud O. Elish A comparative study of fault density prediction in aspect-oriented systems using MLP, RBF, KNN, RT, DENFIS and SVR models , 2014, Artificial Intelligence Review.

[30]  Tim Menzies,et al.  Local vs. global models for effort estimation and defect prediction , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[31]  Ayse Basar Bener,et al.  Software Defect Identification Using Machine Learning Techniques , 2006, 32nd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO'06).

[32]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[33]  Ahmed E. Hassan,et al.  Think locally, act globally: Improving defect and effort prediction models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[34]  Silvio Romero de Lemos Meira,et al.  Enhancing RBF-DDA Algorithm's Robustness: Neural Networks Applied to Prediction of Fault-Prone Software Modules , 2008, IFIP AI.

[35]  L. Futcher,et al.  IFIP – The International Federation for Information Processing , 2004 .

[36]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[37]  Ayse Basar Bener,et al.  An industrial case study of classifier ensembles for locating software defects , 2011, Software Quality Journal.

[38]  Joanne Bechta Dugan,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007, IEEE Transactions on Software Engineering.