Getting the Most Out of Ensemble Selection

We investigate four previously unexplored aspects of ensemble selection, a procedure for building ensembles of classifiers. First we test whether adjusting model predictions to put them on a canonical scale makes the ensembles more effective. Second, we explore the performance of ensemble selection when different amounts of data are available for ensemble hillclimbing. Third, we quantify the benefit of ensemble selection's ability to optimize to arbitrary metrics. Fourth, we study the performance impact of pruning the number of models available for ensemble selection. Based on our results we present improved ensemble selection methods that double the benefit of the original method.

[1]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[2]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[3]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[4]  Stephen D. Bay Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets , 1998, ICML.

[5]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[7]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[8]  Fabio Roli,et al.  Methods for Designing Multiple Classifier Systems , 2001, Multiple Classifier Systems.

[9]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[10]  Robert P. W. Duin,et al.  The combining classifier: to train or not to train? , 2002, Object recognition supported by user interaction for service robots.

[11]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[12]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[13]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[16]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Thomas D. Sandry,et al.  Applied Data Mining , 2005, Technometrics.

[19]  T. Martinez,et al.  Estimating The Potential for Combining Learning Models , 2005 .

[20]  Lefteris Angelis,et al.  Selective fusion of heterogeneous classifiers , 2005, Intell. Data Anal..

[21]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[22]  Claire Cardie,et al.  Optimizing to Arbitrary NLP Metrics using Ensemble Selection , 2005, HLT.

[23]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[24]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.