AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods

Boosting methods are highly popular and effective supervised learning methods which combine weak learners into a single accurate model with good statistical performance. In this paper, we analyze two well-known boosting methods, AdaBoost and Incremental Forward Stagewise Regression (FS"), by establishing their precise connections to the Mirror Descent algorithm, which is a first-order method in convex optimization. As a consequence of these connections we obtain novel computational guarantees for these boosting methods. In particular, we characterize convergence bounds of AdaBoost, related to both the margin and log-exponential loss function, for any step-size sequence. Furthermore, this paper presents, for the first time, precise computational complexity results for FS".

[1]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[2]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[5]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[6]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[7]  G. Duclos New York 1987 , 2000 .

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[10]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[11]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[12]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[13]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[14]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[15]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..

[16]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[17]  R. Tibshirani,et al.  Forward stagewise regression and the monotone lasso , 2007, 0705.0269.

[18]  R. Schapire,et al.  Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Yoram Singer,et al.  On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.

[21]  Cynthia Rudin,et al.  The Rate of Convergence of Adaboost , 2011, COLT.

[22]  Matus Telgarsky The Fast Convergence of Boosting , 2011, NIPS.

[23]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[24]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[25]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .