论文信息 - Generalization bounds for averaged classifiers

Generalization bounds for averaged classifiers

We study a simple learning algorithm for binary classification. Instead of predicting with the best hypothesis in the hypothesis class, that is, the hypothesis that minimizes the training error, our algorithm predicts with a weighted average of all hypotheses, weighted exponentially with respect to their training error. We show that the prediction of this algorithm is much more stable than the prediction of an algorithm that predicts with the best hypothesis. By allowing the algorithm to abstain from predicting on some examples, we show that the predictions it makes when it does not abstain are very reliable. Finally, we show that the probability that the algorithm abstains is comparable to the generalization error of the best hypothesis in the class.

Y. Mansour | Y. Freund | R. Schapire

[1] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2] Temple F. Smith. Occam's razor , 1980, Nature.

[3] Colin McDiarmid,et al. Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[4] David Haussler,et al. How to use expert advice , 1993, STOC.

[5] N. Fisher,et al. Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[6] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[7] Robert E. Schapire,et al. Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.

[8] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[9] Robert E. Schapire,et al. Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.

[10] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11] Yoav Freund,et al. Predicting a binary sequence almost as well as the optimal biased coin , 2003, COLT '96.