On the Convergence of Boosting Procedures

A boosting algorithm seeks to minimize empirically a loss function in a greedy fashion. The resulted estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. This paper studies convergence of boosting when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. As a side product, these results reveal the importance of restricting the greedy search step sizes, as known in practice through the works of Friedman and others.

[1]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[2]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[4]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[5]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[6]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[7]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[8]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[9]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[10]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[11]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[12]  L. Breiman Arcing Classifiers , 1998 .

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[15]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[16]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[17]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[18]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[19]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[20]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.