An extensive analysis of search-based techniques for predicting defective classes

Abstract In spite of constant planning, effective documentation and proper implementation of a software during its life cycle, many defects still occur. Various empirical studies have found that prediction models developed using software metrics can be used to predict these defects. Researchers have advocated the use of search-based techniques and their hybridized versions in literature for developing software quality prediction models. This study conducts an extensive comparison of 20 search-based techniques, 16 hybridized techniques and 17 machine-learning techniques amongst each other, to develop software defect prediction models using 17 data sets. The comparison framework used in the study is efficient as it (i) deals with the stochastic nature of the techniques (ii) provides a fair comparison amongst the techniques (iii) promotes repeatability of the study and (iv) statistically validates the results. The results of the study indicate promising ability of search-based techniques and their hybridized versions for predicting defective classes.

[1]  Ruchika Malhotra Empirical Research in Software Engineering: Concepts, Analysis, and Applications , 2015 .

[2]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[3]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[4]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[5]  Sebastián Ventura,et al.  A classification module for genetic programming algorithms in JCLEC , 2015, J. Mach. Learn. Res..

[6]  Ruchika Malhotra,et al.  An empirical framework for defect prediction using machine learning techniques with Android software , 2016, Appl. Soft Comput..

[7]  Brian Henderson-Sellers,et al.  Object-oriented metrics: measures of complexity , 1995 .

[8]  Sebastián Ventura,et al.  A comparative study of many-objective evolutionary algorithms for the discovery of software architectures , 2016, Empirical Software Engineering.

[9]  Carl G. Davis,et al.  A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[10]  Arvinder Kaur,et al.  Prediction of Software Quality Model Using Gene Expression Programming , 2009, PROFES.

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[13]  Parag C. Pendharkar,et al.  Exhaustive and heuristic search approaches for learning a software defect prediction model , 2010, Eng. Appl. Artif. Intell..

[14]  Banu Diri,et al.  An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software , 2007, 2nd International Conference on Dependability of Computer Systems (DepCoS-RELCOMEX '07).

[15]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[16]  Filomena Ferrucci,et al.  A further analysis on the use of Genetic Algorithm to configure Support Vector Machines for inter-release fault prediction , 2012, SAC '12.

[17]  Lean Yu,et al.  An evolutionary programming based asymmetric weighted least squares support vector machine ensemble learning methodology for software repository mining , 2012, Inf. Sci..

[18]  Jesús S. Aguilar-Ruiz,et al.  Searching for rules to detect defective modules: A subgroup discovery approach , 2012, Inf. Sci..

[19]  Filomena Ferrucci,et al.  A Genetic Algorithm to Configure Support Vector Machines for Predicting Fault-Prone Components , 2011, PROFES.

[20]  Burak Turhan,et al.  A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction , 2017, Inf. Softw. Technol..

[21]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[22]  Cong Jin,et al.  Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization , 2015, Appl. Soft Comput..

[23]  Ömer Faruk Arar,et al.  Software defect prediction using cost-sensitive neural network , 2015, Appl. Soft Comput..

[24]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[25]  Ruchika Malhotra,et al.  An exploratory study for software change prediction in object-oriented systems using hybridized techniques , 2017, Automated Software Engineering.

[26]  Aurora Trinidad Ramirez Pozo,et al.  A symbolic fault-prediction model based on multiobjective particle swarm optimization , 2010, J. Syst. Softw..

[27]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[28]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[29]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .