论文信息 - Boosting as entropy projection

Boosting as entropy projection

We consider the AdaBoost procedure for boosting weak learners. In AdaBoost, a key step is choosing a new distribution on the training examples based on the old distribution and the mistakes made by the present weak hypothesis. We show how AdaBoost’s choice of the new distribution can be seen as an approximate solution to the following problem: Find a new distribution that is closest to the old distribution subject to the constraint that the new distribution is orthogonal to the vector of mistakes of the current weak hypothesis. The distance (or divergence) between distributions is measured by the relative entropy. Alternatively, we could say that AdaBoost approximately projects the distribution vector onto a hyperplane defined by the mistake vector. We show that this new view of AdaBoost as an entropy projection is dual to the usual view of AdaBoost as minimizing the normalization factors of the updated distributions.

Manfred K. Warmuth | Jyrki Kivinen | Jyrki Kivinen

[1] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[2] R. Tyrrell Rockafellar,et al. Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[3] Y. Censor,et al. An iterative row-action method for interval convex programming , 1981 .

[4] Shun-ichi Amari,et al. Differential-geometrical methods in statistics , 1985 .

[5] Charles L. Byrne,et al. General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis , 1990, IEEE Trans. Inf. Theory.

[6] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1990, COLT '90.

[7] Guy Jumarie,et al. Relative Information — What For? , 1990 .

[8] Robert E. Schapire,et al. The strength of weak learnability , 1990, Mach. Learn..

[9] I. Csiszár. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[10] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.

[11] J. N. Kapur,et al. Entropy optimization principles with applications , 1992 .