Boosting with Noisy Data: Some Views from Statistical Theory

This letter is a comprehensive account of some recent findings about AdaBoost in the presence of noisy data when approached from the perspective of statistical theory. We start from the basic assumption of weak hypotheses used in AdaBoost and study its validity and implications on generalization error. We recommend studying the generalization error and comparing it to the optimal Bayes error when data are noisy. Analytic examples are provided to show that running the unmodified AdaBoost forever will lead to overfit. On the other hand, there exist regularized versions of AdaBoost that are consistent, in the sense that the resulting prediction will approximately attain the optimal performance in the limit of large training samples.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[3]  J. Friedman Stochastic gradient boosting , 2002 .

[4]  L. Breiman USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS , 1999 .

[5]  Wenxin Jiang Does Boosting Over t: Views From an Exact Solution , 2000 .

[6]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[7]  Yi Lin A note on margin-based loss functions in classification , 2004 .

[8]  Nathan Intrator,et al.  Boosting Regression Estimators , 1999, Neural Computation.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10]  Shie Mannor,et al.  On the Existence of Linear Weak Learners and Applications to Boosting , 2002, Machine Learning.

[11]  Thomas Richardson,et al.  Boosting methodology for regression problems , 1999, AISTATS.

[12]  L. Breiman Arcing the edge , 1997 .

[13]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[14]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[15]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[16]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[17]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[18]  Yuhong Yang,et al.  Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .

[19]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[20]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[21]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[22]  Chuan Long,et al.  Boosting Noisy Data , 2001, ICML.

[23]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[24]  B. Yu,et al.  Boosting with the L_2-Loss: Regression and Classification , 2001 .

[25]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[26]  Wenxin Jiang On weak base hypotheses and their implications for boosting regression and classification , 2002 .

[27]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[28]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[29]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[30]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[31]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[32]  Wenxin Jiang,et al.  Is regularization unnecessary for boosting? , 2001, AISTATS.

[33]  Wenxin Jiang Some Results on Weakly Accurate Base Learners for Boosting Regression and Classification , 2000, Multiple Classifier Systems.

[34]  Shie Mannor,et al.  The Consistency of Greedy Algorithms for Classification , 2002, COLT.

[35]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[36]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[37]  Gábor Lugosi,et al.  A Consistent Strategy for Boosting Algorithms , 2002, COLT.

[38]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[39]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .