A Search-based Training Algorithm for Cost-aware Defect Prediction

Research has yielded approaches to predict future defects in software artifacts based on historical information, thus assisting companies in effectively allocating limited development resources and developers in reviewing each others' code changes. Developers are unlikely to devote the same effort to inspect each software artifact predicted to contain defects, since the effort varies with the artifacts' size (cost) and the number of defects it exhibits (effectiveness). We propose to use Genetic Algorithms (GAs) for training prediction models to maximize their cost-effectiveness. We evaluate the approach on two well-known models, Regression Tree and Generalized Linear Model, and predict defects between multiple releases of six open source projects. Our results show that regression models trained by GAs significantly outperform their traditional counterparts, improving the cost-effectiveness by up to 240%. Often the top 10% of predicted lines of code contain up to twice as many defects.

[1]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[2]  Francisco Herrera,et al.  A taxonomy for the crossover operator for real‐coded genetic algorithms: An experimental study , 2003, Int. J. Intell. Syst..

[3]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[4]  Jaechang Nam,et al.  REMI: defect prediction for efficient API testing , 2015, ESEC/SIGSOFT FSE.

[5]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[6]  Michele Lanza,et al.  Evaluating defect prediction approaches: a benchmark and an extensive comparison , 2011, Empirical Software Engineering.

[7]  Tim Menzies,et al.  Sharing experiments using open‐source software , 2011, Softw. Pract. Exp..

[8]  Premkumar T. Devanbu,et al.  How, and why, process metrics are better , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  Xin Yao,et al.  A Learning-to-Rank Approach to Software Defect Prediction , 2015, IEEE Transactions on Reliability.

[10]  Akito Monden,et al.  Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[11]  Ahmed E. Hassan,et al.  Towards improving statistical modeling of software engineering data: think locally, act globally! , 2015, Empirical Software Engineering.

[12]  Andrea De Lucia,et al.  Cross-project defect prediction models: L'Union fait la force , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[13]  Taghi M. Khoshgoftaar,et al.  Genetic programming-based decision trees for software quality classification , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[14]  Tian Jiang,et al.  Personalized defect prediction , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[15]  Wasif Afzal,et al.  On the application of genetic programming for software engineering predictive modeling: A systematic review , 2011, Expert Syst. Appl..

[16]  Tim Menzies,et al.  Class level fault prediction using software clustering , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[17]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[18]  Lionel C. Briand,et al.  A systematic and comprehensive investigation of methods to build and evaluate fault prediction models , 2010, J. Syst. Softw..

[19]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[20]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[21]  Ayse Basar Bener,et al.  Defect prediction from static code features: current results, limitations, new approaches , 2010, Automated Software Engineering.

[22]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[23]  Filomena Ferrucci,et al.  A Genetic Algorithm to Configure Support Vector Machines for Predicting Fault-Prone Components , 2011, PROFES.

[24]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[25]  Vili Podgorelec,et al.  Decision trees , 2018, Encyclopedia of Database Systems.

[26]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[27]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[28]  Tracy Hall,et al.  Researcher Bias: The Use of Machine Learning in Software Defect Prediction , 2014, IEEE Transactions on Software Engineering.

[29]  Gerardo Canfora,et al.  Defect prediction as a multiobjective optimization problem , 2015, Softw. Test. Verification Reliab..

[30]  Yann-Gaël Guéhéneuc,et al.  Design evolution metrics for defect prediction in object oriented systems , 2010, Empirical Software Engineering.

[31]  Mahmoud O. Elish A comparative study of fault density prediction in aspect-oriented systems using MLP, RBF, KNN, RT, DENFIS and SVR models , 2014, Artificial Intelligence Review.

[32]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[33]  Gerardo Canfora,et al.  Defect Prediction as a Multi-Objective Optimization Problem , 2015 .

[34]  Premkumar T. Devanbu,et al.  Recalling the "imprecision" of cross-project defect prediction , 2012, SIGSOFT FSE.

[35]  Kalyanmoy Deb,et al.  Investigation of Mutation Schemes in Real-Parameter Genetic Algorithms , 2012, SEMCCO.

[36]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[37]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[38]  Audris Mockus,et al.  Towards building a universal defect prediction model , 2014, MSR 2014.

[39]  Thomas Zimmermann,et al.  Predicting Bugs from History , 2008, Software Evolution.

[40]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[41]  Rainer Koschke,et al.  Revisiting the evaluation of defect prediction models , 2009, PROMISE '09.

[42]  Joanne Bechta Dugan,et al.  Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007, IEEE Transactions on Software Engineering.