Evaluation of Ensemble Methods in Imbalanced Regression Tasks

Ensemble methods are well known for providing an advantage over single models in a large range of data mining and machine learning tasks. Their benefits are commonly associated to the ability of reducing the bias and/or variance in learning tasks. Ensembles have been studied both for classification and regression tasks with uniform domain preferences. However, only for imbalanced classification these methods were thoroughly studied. In this paper we present an empirical study concerning the predictive ability of ensemble methods bagging and boosting in regression tasks, using 20 data sets with imbalanced distributions, and assuming non-uniform domain preferences. Results show that ensemble methods are capable of providing improvements in predictive ability towards under-represented values, and that this improvement influences the predictive ability of models concerning the average behaviour of the data. Results also show that the smaller data sets are prone to larger improvements in predictive accuracy and that no conclusion could be drawn when considering the percentage of rare cases alone.

[1]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[2]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[4]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[5]  Carla E. Brodley,et al.  Class Imbalance, Redux , 2011, 2011 IEEE 11th International Conference on Data Mining.

[6]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[7]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[8]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[9]  Tin Kam Ho,et al.  MULTIPLE CLASSIFIER COMBINATION: LESSONS AND NEXT STEPS , 2002 .

[10]  Paula Branco Re-sampling Approaches for Regression Tasks under Imbalanced Domains , 2014 .

[11]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[12]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Luís Torgo,et al.  Utility-Based Regression , 2007, PKDD.

[15]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[17]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Haibo He,et al.  Assessment Metrics for Imbalanced Learning , 2013 .

[20]  Luís Torgo,et al.  Precision and Recall for Regression , 2009, Discovery Science.