Evidence Contrary to the Statistical View of Boosting

The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences.

[1]  P. Gill,et al.  Practical optimization , 2019 .

[2]  David Hare,et al.  Life of Galileo, The , 2008 .

[3]  David Mease,et al.  Boosted Classification Trees and Class Probability/Quantile Estimation , 2007, J. Mach. Learn. Res..

[4]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[5]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[6]  Nicolai Bissantz,et al.  Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications , 2007, SIAM J. Numer. Anal..

[7]  R. Tibshirani,et al.  Forward stagewise regression and the monotone lasso , 2007, 0705.0269.

[8]  Peter L. Bartlett,et al.  AdaBoost is Consistent , 2006, J. Mach. Learn. Res..

[9]  Kristin P. Bennett,et al.  The Interplay of Optimization and Machine Learning Research , 2006, J. Mach. Learn. Res..

[10]  Roman W. Lutz,et al.  LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[11]  Jianfeng Gao,et al.  Approximation Lasso Methods for Language Modeling , 2006, ACL.

[12]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[13]  Peter Buhlmann Boosting for high-dimensional linear models , 2006, math/0606789.

[14]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[15]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[16]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[17]  R. Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[18]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[19]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[20]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[21]  J. Friedman Stochastic gradient boosting , 2002 .

[22]  Wenxin Jiang On weak base hypotheses and their implications for boosting regression and classification , 2002 .

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[24]  Chuan Long,et al.  Boosting Noisy Data , 2001, ICML.

[25]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[26]  John D. Lafferty,et al.  Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[27]  John Shawe-Taylor,et al.  A Column Generation Algorithm For Boosting , 2000, ICML.

[28]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[29]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[30]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[31]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[32]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[33]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[34]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[35]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[36]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[37]  Gunnar Rätsch,et al.  Robust Boosting via Convex Optimization: Theory and Applications , 2007 .

[38]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[39]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[40]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[41]  Abraham J. Wyner,et al.  On Boosting and the Exponential Loss , 2003, AISTATS.

[42]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[43]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[44]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[45]  Wenxin Jiang,et al.  Is regularization unnecessary for boosting? , 2001, AISTATS.

[46]  Adele Cutler,et al.  PERT – Perfect Random Tree Ensembles , 2001 .

[47]  B. Yu,et al.  Boosting with the L 2-loss regression and classification , 2001 .

[48]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[49]  L. Breiman Random Forests , 2001, Machine Learning.

[50]  Wenxin Jiang Does Boosting Over t: Views From an Exact Solution , 2000 .

[51]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[52]  G. Ridgeway The State of Boosting ∗ , 1999 .

[53]  L. Breiman Random Forests--random Features , 1999 .

[54]  Leo Breiman,et al.  HALF&HALF BAGGING AND HARD BOUNDARY POINTS , 1998 .

[55]  L. Breiman Arcing Classifiers , 1998 .

[56]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[57]  R. Schapire,et al.  Journal of Machine Learning Research 0 (0) 0 the Dynamics of Adaboost: Cyclic Behavior and Convergence of Margins , 2022 .