论文信息 - An Empirical Comparison of Supervised Ensemble Learning Approaches

An Empirical Comparison of Supervised Ensemble Learning Approaches

We present an extensive empirical comparison between twenty prototypical supervised ensemble learning algorithms, including Boosting, Bagging, Random Forests, Rotation Forests, Arc-X4, Class-Switching and their variants, as well as more recent techniques like Random Patches. These algorithms were compared against each other in terms of threshold, ranking/ordering and probability metrics over nineteen UCI benchmark datasets with binary labels. We also examine the influence of two base learners, CART and Extremely Randomized Trees, and the effect of calibrating the models via Isotonic Regression on each performance metric. The selected datasets were already used in various empirical studies and cover different application domains. The experimental analysis was restricted to the hundred most relevant features according to the SNR filter method with a view to dramatically reducing the computational burden involved by the simulation. The source code and the detailed results of our study are publicly available.

Haytham Elghazel | Alex Aussem | Mohamed Bibimoune

[1] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[3] Friedhelm Schwenker,et al. Ensemble Methods: Foundations and Algorithms [Book Review] , 2013, IEEE Computational Intelligence Magazine.

[4] Thomas G. Dietterich,et al. Pruning Adaptive Boosting , 1997, ICML.

[5] Pierre Geurts,et al. Extremely randomized trees , 2006, Machine Learning.

[6] Leo Breiman,et al. Randomizing Outputs to Increase Prediction Accuracy , 2000, Machine Learning.

[7] Daniel Hernández-Lobato,et al. How large should ensembles of classifiers be? , 2013, Pattern Recognit..

[8] Roger E Bumgarner,et al. Comparative hybridization of an array of 21,500 ovarian cDNAs for the discovery of genes overexpressed in ovarian carcinomas. , 1999, Gene.

[9] Chun-Xia Zhang,et al. RotBoost: A technique for combining Rotation Forest and AdaBoost , 2008, Pattern Recognit. Lett..

[10] Leo Breiman,et al. Bias, Variance , And Arcing Classifiers , 1996 .

[11] Rich Caruana,et al. Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.