Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which creates what we denote as a "spikey-smooth" classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

[1]  P. Bickel,et al.  Some Theory for Generalized Boosting Algorithms , 2006, J. Mach. Learn. Res..

[2]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[3]  Rich Caruana,et al.  An Empirical Comparison of Supervised Learning Algorithms Using Different Performance Metrics , 2005 .

[4]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[5]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[6]  T. Therneau,et al.  An Introduction to Recursive Partitioning Using the RPART Routines , 2015 .

[7]  Cynthia Rudin,et al.  The Rate of Convergence of Adaboost , 2011, COLT.

[8]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[9]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[10]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[11]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[12]  Stefan Wager,et al.  Uniform Convergence of Random Forests via Adaptive Concentration , 2015 .

[13]  Mehryar Mohri,et al.  Deep Boosting , 2014, ICML.

[14]  Paul Grigas,et al.  AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods , 2013, ArXiv.

[15]  Wenxin Jiang On weak base hypotheses and their implications for boosting regression and classification , 2002 .

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[19]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[22]  Robert E. Schapire,et al.  Explaining AdaBoost , 2013, Empirical Inference.

[23]  Abraham J. Wyner,et al.  On Boosting and the Exponential Loss , 2003, AISTATS.

[24]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[25]  Zhi-Hua Zhou,et al.  On the doubt about margin explanation of boosting , 2010, Artif. Intell..

[26]  A Mayr,et al.  The Evolution of Boosting Algorithms , 2014, Methods of Information in Medicine.

[27]  Luis E. Ortiz,et al.  On the Convergence Properties of Optimal AdaBoost , 2012, ArXiv.

[28]  Zhi-Hua Zhou,et al.  A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin , 2011, J. Mach. Learn. Res..

[29]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[30]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[31]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.