Comparing ensemble learning approaches in genetic programming for classification with unbalanced data

This paper compares three approaches to evolving ensembles in Genetic Programming (GP) for binary classification with unbalanced data. The first uses bagging with sampling, while the other two use Pareto-based multi-objective GP (MOGP) for the trade-off between the two (unequal) classes. In MOGP, two ways are compared to build the ensembles: using the evolved Pareto front alone, and using the whole evolved population of dominated and non-dominated individuals alike. Experiments on several benchmark (binary) unbalanced tasks find that smaller, more diverse ensembles chosen during ensemble selection perform best due to better generalisation, particularly when the combined knowledge of the whole evolved MOGP population forms the ensemble.