论文信息 - Boosting with the L 2-Loss : Regression and Classi cationPeter

Boosting with the L 2-Loss : Regression and Classi cationPeter

This paper investigates a computationally simple variant of boosting, L 2 Boost, which is constructed from a functional gradient descent algorithm with the L 2-loss function. As other boosting algorithms, L 2 Boost uses many times in an iterative fashion a pre-chosen tting method, called the learner. Based on the explicit expression of reetting of residuals of L 2 Boost, the case with (symmetric) linear learners is studied in detail in both regression and classiication. In particular, with the boosting iteration m working as the smoothing or regularization parameter, a new exponential bias-variance trade oo is found with the variance (complexity) term increasing very slowly as m tends to innnity. When the learner is a smoothing spline, an optimal rate of convergence result holds for both regression and classiication and the boosted smoothing spline even adapts to higher order, unknown smoothness. Moreover, a simple expansion of a (smoothed) 0-1 loss function is derived to reveal the importance of the decision boundary, bias reduction, and impossibility of an additive bias-variance decomposition in classiication. Finally, simulation and real data set results are obtained to demonstrate the attractiveness of L 2 Boost. In particular, we demonstrate that L 2 Boosting with a novel component-wise cubic smoothing spline is both practical and eeective in the presence of high-dimensional predictors.

Bin Yu | Bin Yu

[1] Peter L. Bartlett,et al. Functional Gradient Techniques for Combining Hypotheses , 2000 .

[2] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[3] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1990, COLT '90.

[4] Robert E. Schapire,et al. The strength of weak learnability , 1990, Mach. Learn..

[5] John W. Tukey,et al. Exploratory Data Analysis. , 1979 .

[6] Wenxin Jiang. Process consistency for AdaBoost , 2003 .

[7] L. Breiman. SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[8] Thomas G. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[9] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[10] Guohua Pan,et al. Local Regression and Likelihood , 1999, Technometrics.

[11] J. Marron. Optimal Rates of Convergence to Bayes Risk in Nonparametric Discrimination , 1983 .