Recent work has shown that adaptively reweighting the training set, growing a classifier using the new weights, and combining the classifiers constructed to date can significantly decrease generalization error. Procedures of this type were called arcing by Breiman[1996]. The first successful arcing procedure was introduced by Freund and Schapire[1995,1996] and called Adaboost. In an effort to explain why Adaboost works, Schapire et.al. [1997] derived a bound on the generalization error of a convex combination of classifiers in terms of the margin. We introduce a function called the edge, which differs from the margin only if there are more than two classes. A framework for understanding arcing algorithms is defined. In this framework, we see that the arcing algorithms currently in the literature are optimization algorithms which minimize some function of the edge. A relation is derived between the optimal reduction in the maximum value of the edge and the PAC concept of weak learner. Two algorithms are described which achieve the optimal reduction. Tests on both synthetic and real data cast doubt on the Schapire et.al. explanation.
[1]
Thomas G. Dietterich,et al.
Error-Correcting Output Coding Corrects Bias and Variance
,
1995,
ICML.
[2]
Corinna Cortes,et al.
Boosting Decision Trees
,
1995,
NIPS.
[3]
Yoav Freund,et al.
A decision-theoretic generalization of on-line learning and an application to boosting
,
1995,
EuroCOLT.
[4]
Yoav Freund,et al.
Experiments with a New Boosting Algorithm
,
1996,
ICML.
[5]
J. Ross Quinlan,et al.
Bagging, Boosting, and C4.5
,
1996,
AAAI/IAAI, Vol. 1.
[6]
Leo Breiman,et al.
Bias, Variance , And Arcing Classifiers
,
1996
.
[7]
Leo Breiman,et al.
Bagging Predictors
,
1996,
Machine Learning.