Margins, Shrinkage, and Boosting

This manuscript shows that AdaBoost and its immediate variants can produce approximate maximum margin classifiers simply by scaling step size choices with a fixed small constant. In this way, when the unscaled step size is an optimal choice, these results provide guarantees for Friedman's empirically successful "shrinkage" procedure for gradient boosting (Friedman, 2000). Guarantees are also provided for a variety of other step sizes, affirming the intuition that increasingly regularized line searches provide improved margin guarantees. The results hold for the exponential loss and similar losses, most notably the logistic loss.

[1]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[2]  M. Kearns,et al.  Crytographic limitations on learning Boolean formulae and finite automata , 1989, STOC '89.

[3]  Leonid A. Levin,et al.  A hard-core predicate for all one-way functions , 1989, STOC '89.

[4]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[5]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[6]  Russell Impagliazzo,et al.  Hard-core distributions for somewhat hard problems , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[9]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[10]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[11]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[14]  J. Steele The Cauchy–Schwarz Master Class: References , 2004 .

[15]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[16]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[17]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..

[18]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[19]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[20]  Gunnar Rätsch,et al.  Totally corrective boosting algorithms that maximize the margin , 2006, ICML.

[21]  R. Schapire,et al.  Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.

[22]  Yoram Singer,et al.  On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.

[23]  R. Schapire The Convergence Rate of AdaBoost , 2010, Annual Conference Computational Learning Theory.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Matus Telgarsky,et al.  A Primal-Dual Convergence Analysis of Boosting , 2011, J. Mach. Learn. Res..

[26]  Yoav Freund,et al.  Boosting: Foundations and Algorithms , 2012 .