Boosting in the Limit: Maximizing the Margin of Learned Ensembles

The "minimum margin" of an ensemble classifier on a given training set is, roughly speaking, the smallest vote it gives to any correct training label. Recent work has shown that the Adaboost algorithm is particularly effective at producing ensembles with large minimum margins, and theory suggests that this may account for its success at reducing generalization error. We note, however, that the problem of finding good margins is closely related to linear programming, and we use this connection to derive and test new "LPboosting" algorithms that achieve better minimum margins than Adaboost.However, these algorithms do not always yield better generalization performance. In fact, more often the opposite is true. We report on a series of controlled experiments which show that no simple version of the minimum-margin story can be complete. We conclude that the crucial question as to why boosting works so well in practice, and how to further improve upon it, remains mostly open.Some of our experiments are interesting for another reason: we show that Adaboost sometimes does overfit--eventually. This may take a very long time to occur, however, which is perhaps why this phenomenon has gone largely unnoticed.

[1]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[2]  L. Khachiyan Polynomial algorithms in linear programming , 1980 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[8]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[9]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[10]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[11]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[12]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[13]  L. Breiman Arcing the edge , 1997 .

[14]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[15]  Robert J. Vanderbei,et al.  Linear Programming: Foundations and Extensions , 1998, Kluwer international series in operations research and management service.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.