Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study

Evolving diverse ensembles using genetic programming has recently been proposed for classification problems with unbalanced data. Population diversity is crucial for evolving effective algorithms. Multilevel selection strategies that involve additional colonization and migration operations have shown better performance in some applications. Therefore, in this paper, we are interested in analysing the performance of evolving diverse ensembles using genetic programming for software defect prediction with unbalanced data by using different selection strategies. We use colonization and migration operators along with three ensemble selection strategies for the multi-objective evolutionary algorithm. We compare the performance of the operators for software defect prediction datasets with varying levels of data imbalance. Moreover, to generalize the results, gain a broader view and understand the underlying effects, we replicated the same experiments on UCI datasets, which are often used in the evolutionary computing community. The use of multilevel selection strategies provides reliable results with relatively fast convergence speeds and outperforms the other evolutionary algorithms that are often used in this research area and investigated in this paper. This paper also presented a promising ensemble strategy based on a simple convex hull approach and at the same time it raised the question whether ensemble strategy based on the whole population should also be investigated.

[1]  Bojana Dalbelo Basic,et al.  Software defect prediction with Bug-Code analyzer - A data collection tool demo , 2014, 2014 22nd International Conference on Software, Telecommunications and Computer Networks (SoftCOM).

[2]  Bojana Dalbelo Basic,et al.  Data collection for Software Defect Prediction - An exploratory case study of open source software projects , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[3]  Kenneth A. De Jong,et al.  A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[4]  Dikai Liu,et al.  Distributed classifier migration in xcs for classification of electroencephalographic signals , 2007, 2007 IEEE Congress on Evolutionary Computation.

[5]  Mark Johnston,et al.  Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Licheng Jiao,et al.  Multiobjective optimization of classifiers by means of 3D convex-hull-based evolutionary algorithms , 2014, Inf. Sci..

[7]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[8]  Kenneth de Jong Co-Evolutionary Algorithms: A Useful Computational Abstraction? , 2015, SSBSE.

[9]  Ying Ma,et al.  On Software Defect Prediction Using Machine Learning , 2014, J. Appl. Math..

[10]  Xin Yao,et al.  Making use of population information in evolutionary artificial neural networks , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[11]  Xin Yao,et al.  Convex Hull-Based Multi-objective Genetic Programming for Maximizing ROC Performance , 2013, ArXiv.

[12]  Taghi M. Khoshgoftaar,et al.  Improving Software-Quality Predictions With Data Sampling and Boosting , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[13]  Bojana Dalbelo Basic,et al.  A systematic data collection procedure for software defect prediction , 2016, Comput. Sci. Inf. Syst..

[14]  Reza Akbari,et al.  A multilevel evolutionary algorithm for optimizing numerical functions , 2011 .

[15]  Banu Diri,et al.  A systematic review of software fault prediction studies , 2009, Expert Syst. Appl..

[16]  Cagatay Catal,et al.  Software mining and fault prediction , 2012, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[17]  Mark Johnston,et al.  Differentiating between individual class performance in Genetic Programming fitness for classification with unbalanced data , 2009, 2009 IEEE Congress on Evolutionary Computation.

[18]  Swagatam Das,et al.  Inducing Niching Behavior in Differential Evolution Through Local Information Sharing , 2015, IEEE Transactions on Evolutionary Computation.

[19]  Sun-Jen Huang,et al.  Optimization of analogy weights by genetic algorithm for software effort estimation , 2006, Inf. Softw. Technol..

[20]  Bernhard Sendhoff,et al.  Generalization Improvement in Multi-Objective Learning , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[21]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[22]  Tihana Galinac Grbac,et al.  Techniques for Bug-Code Linking , 2014, SQAMIA.

[23]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[24]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[25]  Yves Lecourtier,et al.  A multi-model selection framework for unknown and/or evolutive misclassification cost problems , 2010, Pattern Recognit..

[26]  Bojana Dalbelo Basic,et al.  Rotation Forest in Software Defect Prediction , 2015, SQAMIA.

[27]  Mark Johnston,et al.  Evolving ensembles in multi-objective genetic programming for classification with unbalanced data , 2011, GECCO '11.

[28]  Mark Johnston,et al.  Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data , 2014, IEEE Transactions on Evolutionary Computation.

[29]  Zbigniew Skolicki,et al.  The influence of migration sizes and intervals on island models , 2005, GECCO '05.

[30]  Gary L. Haith,et al.  Comparing a coevolutionary genetic algorithm for multiobjective optimization , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[31]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[32]  Joelle Pineau,et al.  Online Bagging and Boosting for Imbalanced Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[33]  Tihana Galinac Grbac,et al.  Software Defect Classification with a Variant of NSGA-II and Simple Voting Strategies , 2015, SSBSE.

[34]  R. Paul Wiegand,et al.  An empirical analysis of collaboration methods in cooperative coevolutionary algorithms , 2001 .

[35]  James P. Cohoon,et al.  C6.3 Island (migration) models: evolutionary algorithms based on punctuated equilibria , 1997 .

[36]  Li Zhang,et al.  Software Defect Prediction Based on Competitive Organization CoEvolutionary Algorithm , 2012 .

[37]  Mark Harman,et al.  Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases , 2014, SSBSE.

[38]  Cagatay Catal,et al.  Software fault prediction: A literature review and current trends , 2011, Expert Syst. Appl..

[39]  Per Runeson,et al.  A Second Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems , 2007, IEEE Transactions on Software Engineering.

[40]  Urszula Boryczka,et al.  Enhancing the effectiveness of Ant Colony Decision Tree algorithms by co-learning , 2015, Appl. Soft Comput..

[41]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[42]  Reza Akbari,et al.  MLGA: A Multilevel Cooperative Genetic Algorithm , 2010, 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA).

[43]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[44]  Hisao Ishibuchi,et al.  Behavior of Multiobjective Evolutionary Algorithms on Many-Objective Knapsack Problems , 2015, IEEE Transactions on Evolutionary Computation.

[45]  Marco Tomassini,et al.  Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time (Natural Computing Series) , 2005 .

[46]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[47]  Darko Huljenic,et al.  On the probability distribution of faults in complex software systems , 2015, Inf. Softw. Technol..

[48]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[49]  Bruno Sareni,et al.  Fitness sharing and niching methods revisited , 1998, IEEE Trans. Evol. Comput..

[50]  Michael Kirley,et al.  CoXCS: A Coevolutionary Learning Classifier Based on Feature Space Partitioning , 2009, Australasian Conference on Artificial Intelligence.

[51]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[52]  Claudio De Stefano,et al.  Where Are the Niches? Dynamic Fitness Sharing , 2007, IEEE Transactions on Evolutionary Computation.