Evidence Contrary to the Statistical View of Boosting

The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences.

[1]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[2]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[3]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[4]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[5]  Gunnar Rätsch,et al.  Robust Boosting via Convex Optimization: Theory and Applications , 2007 .

[6]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[7]  R. Tibshirani,et al.  Forward stagewise regression and the monotone lasso , 2007, 0705.0269.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  L. Breiman Arcing Classifiers , 1998 .

[10]  Wenxin Jiang,et al.  Is regularization unnecessary for boosting? , 2001, AISTATS.

[11]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[12]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[13]  John D. Lafferty,et al.  Boosting and Maximum Likelihood for Exponential Models , 2001, NIPS.

[14]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[15]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[16]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[17]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[18]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[19]  L. Breiman Random Forests--random Features , 1999 .

[20]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[23]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[24]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[25]  Nicolai Bissantz,et al.  Convergence Rates of General Regularization Methods for Statistical Inverse Problems and Applications , 2007, SIAM J. Numer. Anal..

[26]  P. Bühlmann Boosting for high-dimensional linear models , 2006 .

[27]  Wenxin Jiang On weak base hypotheses and their implications for boosting regression and classification , 2002 .

[28]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[29]  Abraham J. Wyner,et al.  On Boosting and the Exponential Loss , 2003, AISTATS.

[30]  Leo Breiman,et al.  HALF&HALF BAGGING AND HARD BOUNDARY POINTS , 1998 .

[31]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[32]  Chuan Long,et al.  Boosting Noisy Data , 2001, ICML.

[33]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[34]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[35]  David Mease,et al.  Boosted Classification Trees and Class Probability/Quantile Estimation , 2007, J. Mach. Learn. Res..

[36]  Manfred K. Warmuth,et al.  Boosting as entropy projection , 1999, COLT '99.

[37]  J. Friedman Stochastic gradient boosting , 2002 .

[38]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[39]  David Hare,et al.  Life of Galileo, The , 2008 .

[40]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[41]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[42]  Jianfeng Gao,et al.  Approximation Lasso Methods for Language Modeling , 2006, ACL.

[43]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[44]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[45]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[46]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[47]  Philip E. Gill,et al.  Practical optimization , 1981 .

[48]  G. Ridgeway The State of Boosting ∗ , 1999 .

[49]  Peter L. Bartlett,et al.  AdaBoost is Consistent , 2006, J. Mach. Learn. Res..

[50]  Wenxin Jiang Does Boosting Over t: Views From an Exact Solution , 2000 .

[51]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[52]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[53]  Roman W. Lutz,et al.  LogitBoost with Trees Applied to the WCCI 2006 Performance Prediction Challenge Datasets , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[54]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[55]  Kristin P. Bennett,et al.  The Interplay of Optimization and Machine Learning Research , 2006, J. Mach. Learn. Res..

[56]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[57]  Adele Cutler,et al.  PERT – Perfect Random Tree Ensembles , 2001 .