Prediction Games and Arcing Algorithms

The theory behind the success of adaptive reweighting and combining algorithms (arcing) such as Adaboost (Freund & Schapire, 1996a, 1997) and others in reducing generalization error has not been well understood. By formulating prediction as a game where one player makes a selection from instances in the training set and the other a convex linear combination of predictors from a finite set, existing arcing algorithms are shown to be algorithms for finding good game strategies. The minimax theorem is an essential ingredient of the convergence proofs. An arcing algorithm is described that converges to the optimal strategy. A bound on the generalization error for the combined predictors in terms of their maximum error is proven that is sharper than bounds to date. Schapire, Freund, Bartlett, and Lee (1997) offered an explanation of why Adaboost works in terms of its ability to produce generally high margins. The empirical comparison of Adaboost to the optimal arcing algorithm shows that their explanation is not complete.