A Refined Margin Analysis for Boosting Algorithms via Equilibrium Margin

Much attention has been paid to the theoretical explanation of the empirical success of AdaBoost. The most influential work is the margin theory, which is essentially an upper bound for the generalization error of any voting classifier in terms of the margin distribution over the training data. However, important questions were raised about the margin explanation. Breiman (1999) proved a bound in terms of the minimum margin, which is sharper than the margin distribution bound. He argued that the minimum margin would be better in predicting the generalization error. Grove and Schuurmans (1998) developed an algorithm called LP-AdaBoost which maximizes the minimum margin while keeping all other factors the same as AdaBoost. In experiments however, LP-AdaBoost usually performs worse than AdaBoost, putting the margin explanation into serious doubt. In this paper, we make a refined analysis of the margin theory. We prove a bound in terms of a new margin measure called the Equilibrium margin (Emargin). The Emargin bound is uniformly sharper than Breiman's minimum margin bound. Thus our result suggests that the minimum margin may be not crucial for the generalization error. We also show that a large Emargin and a small empirical error at Emargin imply a smaller bound of the generalization error. Experimental results on benchmark data sets demonstrate that AdaBoost usually has a larger Emargin and a smaller test error than LP-AdaBoost, which agrees well with our theory.

[1]  R. Schapire,et al.  Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.

[2]  W. Hoeffding Probability inequalities for sum of bounded random variables , 1963 .

[3]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[4]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[5]  L. Devroye Bounds for the Uniform Deviation of Empirical Measures , 1982 .

[6]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[7]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[8]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[9]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[10]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[14]  Liwei Wang,et al.  On learning with dissimilarity functions , 2007, ICML '07.

[15]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[16]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[17]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[18]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[19]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[20]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[21]  Robert E. Schapire,et al.  How boosting the margin can also boost classifier complexity , 2006, ICML.

[22]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[23]  L. Breiman Population theory for boosting ensembles , 2003 .

[24]  Gunnar Rätsch,et al.  Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..

[25]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[26]  Zhi-Hua Zhou,et al.  On the Margin Explanation of Boosting Algorithms , 2008, COLT.

[27]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[28]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[29]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[30]  V. Koltchinskii,et al.  Complexities of convex combinations and bounding the generalization error in classification , 2004, math/0405356.

[31]  Ana I. González Acuña An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization , 2012 .

[32]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[33]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[34]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.