How boosting the margin can also boost classifier complexity

Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this explanation by introducing a boosting algorithm, arc-gv, that can generate a higher margins distribution than AdaBoost and yet performs worse. In this paper, we take a close look at Breiman's compelling but puzzling results. Although we can reproduce his main finding, we find that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by our experiments and entirely consistent with the margins theory. Thus, we find maximizing the margins is desirable, but not necessarily at the expense of other factors, especially base-classifier complexity.

[1]  Temple F. Smith Occam's razor , 1980, Nature.

[2]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[6]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[7]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[8]  L. Breiman Arcing Classifiers , 1998 .

[9]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[10]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[11]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[12]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[13]  Gunnar Rätsch,et al.  Maximizing the Margin with Boosting , 2002, COLT.

[14]  Peter L. Bartlett,et al.  Generalization Error of Combined Classifiers , 2002, J. Comput. Syst. Sci..

[15]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[16]  Cynthia Rudin,et al.  Boosting Based on a Smooth Margin , 2004, COLT.