Robust boosting and its relation to bagging

Several authors have suggested viewing boosting as a gradient descent search for a good fit in function space. At each iteration observations are re-weighted using the gradient of the underlying loss function. We present an approach of weight decay for observation weights which is equivalent to "robustifying" the underlying loss function. At the extreme end of decay this approach converges to Bagging, which can be viewed as boosting with a linear underlying loss function. We illustrate the practical usefulness of weight decay for improving prediction performance and present an equivalence between one form of weight decay and "Huberizing" --- a statistical method for making loss functions more robust.

[1]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[2]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[3]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[4]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[5]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[6]  L. Breiman Arcing Classifiers , 1998 .

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[9]  David P. Helmbold,et al.  Potential Boosters? , 1999, NIPS.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[13]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Saharon Rosset,et al.  Boosting Density Estimation , 2002, NIPS.

[16]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[17]  J. Friedman,et al.  On bagging and nonlinear estimation , 2007 .

[18]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[19]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[20]  Ji Zhu,et al.  Margin Maximizing Loss Functions , 2003, NIPS.