Some Theory for Generalized Boosting Algorithms

We give a review of various aspects of boosting, clarifying the issues through a few simple results, and relate our work and that of others to the minimax paradigm of statistics. We consider the population version of the boosting algorithm and prove its convergence to the Bayes classifier as a corollary of a general result about Gauss-Southwell optimization in Hilbert space. We then investigate the algorithmic convergence of the sample version, and give bounds to the time until perfect separation of the sample. We conclude by some results on the statistical optimality of the L2 boosting.

[1]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[2]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[3]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[4]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[5]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[6]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[7]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[8]  Peter J. Bickel,et al.  The golden chain , 2003 .

[9]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[10]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[11]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[12]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[13]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[14]  R. Prentice,et al.  Commentary on Andersen and Gill's "Cox's Regression Model for Counting Processes: A Large Sample Study" , 1982 .

[15]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[16]  Yuhong Yang,et al.  Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .

[17]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[18]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[19]  Y. Baraud Model selection for regression on a random design , 2002 .

[20]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[21]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[22]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[23]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[24]  Shie Mannor,et al.  Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity , 2003, J. Mach. Learn. Res..

[25]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[26]  P. Bickel,et al.  Uniform Convergence of Probability Measures on Classes of Functions , 2008 .

[27]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[28]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[29]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[30]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .