Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting

The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as ``the statistical view'' has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided. [arXiv:0804.2752]

[1]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[2]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[4]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[5]  David Mease,et al.  Boosted Classification Trees and Class Probability/Quantile Estimation , 2007, J. Mach. Learn. Res..

[6]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[7]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[8]  Abraham J. Wyner,et al.  On Boosting and the Exponential Loss , 2003, AISTATS.

[9]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[10]  G. Ridgeway The State of Boosting ∗ , 1999 .

[11]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[12]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13]  L. Breiman Arcing the edge , 1997 .

[14]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[15]  L. Breiman Population theory for boosting ensembles , 2003 .

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[18]  J. Friedman Stochastic gradient boosting , 2002 .

[19]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[20]  L. Breiman Random Forests--random Features , 1999 .

[21]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .