The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins

In order to study the convergence properties of the AdaBoost algorithm, we reduce AdaBoost to a nonlinear iterated map and study the evolution of its weight vectors. This dynamical systems approach allows us to understand AdaBoost's convergence properties completely in certain cases; for these cases we find stable cycles, allowing us to explicitly solve for AdaBoost's output.Using this unusual technique, we are able to show that AdaBoost does not always converge to a maximum margin combined classifier, answering an open question. In addition, we show that "non-optimal" AdaBoost (where the weak learning algorithm does not necessarily choose the best weak classifier at each iteration) may fail to converge to a maximum margin classifier, even if "optimal" AdaBoost produces a maximum margin. Also, we show that if AdaBoost cycles, it cycles among "support vectors", i.e., examples that achieve the same smallest margin.

[1]  J. Yorke,et al.  Period Three Implies Chaos , 1975 .

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[4]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[5]  L. Breiman Arcing the edge , 1997 .

[6]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[7]  L. Breiman Arcing Classifiers , 1998 .

[8]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[9]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[10]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[11]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[12]  David P. Helmbold,et al.  A geometric approach to leveraging weak learners , 1999, Theor. Comput. Sci..

[13]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[14]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[15]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[16]  Cesare Furlanello,et al.  Highlighting Hard Patterns via AdaBoost Weights Evolution , 2002, Multiple Classifier Systems.

[17]  Gunnar Rätsch,et al.  Maximizing the Margin with Boosting , 2002, COLT.

[18]  Cynthia Rudin,et al.  On the Dynamics of Boosting , 2003, NIPS.

[19]  Abraham J. Wyner,et al.  On Boosting and the Exponential Loss , 2003, AISTATS.

[20]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[21]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[22]  Cynthia Rudin,et al.  Boosting Based on a Smooth Margin , 2004, COLT.

[23]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[24]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[25]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[26]  R. Schapire,et al.  Boosting, margins, and dynamics , 2004 .

[27]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..