Explaining AdaBoost

Boosting is an approach to machine learning based on the idea of creating a highly accurate prediction rule by combining many relatively weak and inaccurate rules. The AdaBoost algorithm of Freund and Schapire was the first practical boosting algorithm, and remains one of the most widely used and studied, with applications in numerous fields. This chapter aims to review some of the many perspectives and analyses of AdaBoost that have been applied to explain or understand it as a learning method, with comparisons of both the strengths and weaknesses of the various approaches.

[1]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[2]  Peter L. Bartlett,et al.  Direct Optimization of Margins Improves Generalization in Combined Classifiers , 1998, NIPS.

[3]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[4]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[5]  Abraham J. Wyner,et al.  On Boosting and the Exponential Loss , 2003, AISTATS.

[6]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .

[7]  Peter L. Bartlett,et al.  AdaBoost is Consistent , 2006, J. Mach. Learn. Res..

[8]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[11]  P. Bickel,et al.  Some Theory for Generalized Boosting Algorithms , 2006, J. Mach. Learn. Res..

[12]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[13]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML.

[14]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[17]  Zhi-Hua Zhou,et al.  A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin , 2011, J. Mach. Learn. Res..

[18]  Tom,et al.  A simple cost function for boostingMarcus , .

[19]  Yoav Freund,et al.  An Adaptive Version of the Boost by Majority Algorithm , 1999, COLT.

[20]  Rocco A. Servedio,et al.  Martingale Boosting , 2005, COLT.

[21]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[22]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[23]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[24]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[25]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[26]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[27]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[28]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[29]  Rocco A. Servedio,et al.  Adaptive Martingale Boosting , 2008, NIPS.

[30]  Peng Zhao,et al.  Stagewise Lasso , 2007, J. Mach. Learn. Res..

[31]  Gunnar Rätsch,et al.  An asymptotic analysis of AdaBoost in the binary classification case , 1998 .

[32]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[33]  Shie Mannor,et al.  Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity , 2003, J. Mach. Learn. Res..

[34]  L. Breiman Population theory for boosting ensembles , 2003 .

[35]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[36]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[37]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[38]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[39]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[40]  Rocco A. Servedio,et al.  Boosting in the presence of noise , 2003, STOC '03.