Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data

In classification, machine learning algorithms can suffer a performance bias when data sets are unbalanced. Data sets are unbalanced when at least one class is represented by only a small number of training examples (called the minority class), while the other class(es) make up the majority. In this scenario, classifiers can have good accuracy on the majority class, but very poor accuracy on the minority class(es). This paper proposes a multiobjective genetic programming (MOGP) approach to evolving accurate and diverse ensembles of genetic program classifiers with good performance on both the minority and majority of classes. The evolved ensembles comprise of nondominated solutions in the population where individual members vote on class membership. This paper evaluates the effectiveness of two popular Pareto-based fitness strategies in the MOGP algorithm (SPEA2 and NSGAII), and investigates techniques to encourage diversity between solutions in the evolved ensembles. Experimental results on six (binary) class imbalance problems show that the evolved ensembles outperform their individual members, as well as single-predictor methods such as canonical GP, naive Bayes, and support vector machines, on highly unbalanced tasks. This highlights the importance of developing an effective fitness evaluation strategy in the underlying MOGP algorithm to evolve good ensemble members.

[1]  F. Mosteller,et al.  Fundamentals of exploratory analysis of variance , 1991 .

[2]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[3]  Malcolm I. Heywood,et al.  GP Classification under Imbalanced Data sets: Active Sub-sampling and AUC Approximation , 2008, EuroGP.

[4]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[5]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[6]  Bob McKay,et al.  Anti-correlation Measures in Genetic Programming R I ( , 2001 .

[7]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[8]  Malcolm I. Heywood,et al.  Training genetic programming on half a million patterns: an example from anomaly detection , 2005, IEEE Transactions on Evolutionary Computation.

[9]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Mark Johnston,et al.  Multi-Objective Genetic Programming for Classification with Unbalanced Data , 2009, Australasian Conference on Artificial Intelligence.

[11]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[12]  Gustavo E. A. P. A. Batista,et al.  Balancing Strategies and Class Overlapping , 2005, IDA.

[13]  Xin Yao,et al.  Making use of population information in evolutionary artificial neural networks , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  J. Holmes Differential Negative Reinforcement Improves Classifier System Learning Rate in Two-Class Problems with Unequal Base Rates , 1990 .

[16]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[17]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[18]  Wolfgang Banzhaf,et al.  Evolving Teams of Predictors with Linear Genetic Programming , 2001, Genetic Programming and Evolvable Machines.

[19]  Hussein A. Abbass Pareto Neuro-Ensembles , 2003, Australian Conference on Artificial Intelligence.

[20]  Aravind Seshadri,et al.  A FAST ELITIST MULTIOBJECTIVE GENETIC ALGORITHM: NSGA-II , 2000 .

[21]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Xin Yao,et al.  Theoretical Study of the Relationship between Diversity and Single-Class Measures for Class Imbalance Learning , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[23]  Xin Yao,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Relationships between Diversity of Classification Ensembles and Single-class Performance Measures , 2022 .

[24]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[25]  Xin Yao,et al.  Ensemble Learning Using Multi-Objective Evolutionary Algorithms , 2006, J. Math. Model. Algorithms.

[26]  Hussein A. Abbass Anti-correlation Measures in Genetic Programming , 2001 .

[27]  M. Heywood,et al.  Classification as Clustering: A Pareto Cooperative-Competitive GP Approach , 2011, Evolutionary Computation.

[28]  Hussein A. Abbass,et al.  Pareto neuro-evolution: constructing ensemble of neural networks using multi-objective optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[29]  Bernhard Sendhoff,et al.  Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Mark Johnston,et al.  Genetic Programming for Classification with Unbalanced Data , 2010, EuroGP.

[31]  David W. Opitz,et al.  Generating Accurate and Diverse Members of a Neural-Network Ensemble , 1995, NIPS.

[32]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Nitesh V. Chawla,et al.  Exploiting Diversity in Ensembles: Improving the Performance on Unbalanced Datasets , 2007, MCS.

[34]  Huanhuan Chen,et al.  Multiobjective Neural Network Ensembles Based on Regularized Negative Correlation Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[35]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[36]  Mark Johnston,et al.  A Comparison of Classification Strategies in Genetic Programming with Unbalanced Data , 2010, Australasian Conference on Artificial Intelligence.

[37]  Christian Gagné,et al.  Ensemble learning for free with evolutionary algorithms? , 2007, GECCO '07.

[38]  Lothar Thiele,et al.  A Tutorial on the Performance Assessment of Stochastic Multiobjective Optimizers , 2006 .

[39]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[40]  José Salvador Sánchez,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[41]  Huanhuan Chen,et al.  Predictive Ensemble Pruning by Expectation Propagation , 2009, IEEE Transactions on Knowledge and Data Engineering.

[42]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[43]  Salvatore J. Stolfo,et al.  Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 , 1997 .

[44]  Hussein A. Abbass,et al.  Anticorrelation: A Diversity Promoting Mechanisms in Ensemble Learning , 2001 .

[45]  Hussein A. Abbass,et al.  Pareto-Optimal Approaches to Neuro-Ensemble Learning , 2006, Multi-Objective Machine Learning.

[46]  Xin Yao,et al.  Neural-Based Learning Classifier Systems , 2008, IEEE Transactions on Knowledge and Data Engineering.

[47]  Ken Sharman,et al.  A Genetic Programming Approach for Bankruptcy Prediction Using a Highly Unbalanced Database , 2007, EvoWorkshops.

[48]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[49]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[50]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[51]  Terence Soule,et al.  Novel ways of improving cooperation and performance in ensemble classifiers , 2007, GECCO '07.

[52]  Mark Johnston,et al.  Evolving ensembles in multi-objective genetic programming for classification with unbalanced data , 2011, GECCO '11.

[53]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[54]  Xin Yao,et al.  Multi-Objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems , 2010, IEEE Transactions on Reliability.

[55]  Xin Yao,et al.  DIVACE: Diverse and Accurate Ensemble Learning Algorithm , 2004, IDEAL.

[56]  Andrew R. McIntyre,et al.  Multi-objective competitive coevolution for efficient GP classifier problem decomposition , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[57]  Peter Ross,et al.  Dynamic Training Subset Selection for Supervised Learning in Genetic Programming , 1994, PPSN.

[58]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[59]  Xin Yao,et al.  Diversity exploration and negative correlation learning on imbalanced data sets , 2009, 2009 International Joint Conference on Neural Networks.

[60]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..