Boosting Neural Networks

Boosting is a general method for improving the performance of learning algorithms. A recently proposed boosting algorithm, Ada Boost, has been applied with great success to several benchmark machine learning problems using mainly decision trees as base classifiers. In this article we investigate whether Ada Boost also works as well with neural networks, and we discuss the advantages and drawbacks of different versions of the Ada Boost algorithm. In particular, we compare training methods based on sampling the training set and weighting the cost function. The results suggest that random resampling of the training data is not the main explanation of the success of the improvements brought by Ada Boost. This is in contrast to bagging, which directly aims at reducing variance and for which random resampling is essential to obtain the reduction in generalization error. Our system achieves about 1.4 error on a data set of on-line handwritten digits from more than 200 writers. A boosted multilayer network achieved 1.5 error on the UCI letters and 8.1 error on the UCI satellite data set, which is significantly better than boosted decision trees.

[1]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[2]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[3]  Martin Fodslette Møller,et al.  Supervised Learning On Large Redundant Training Sets , 1993, Int. J. Neural Syst..

[4]  M. F. Møller,et al.  Efficient Training of Feed-Forward Neural Networks , 1993 .

[5]  M. Perrone Improving regression estimation: Averaging methods for variance reduction with extensions to general convex measure optimization , 1993 .

[6]  Michael Perrone,et al.  Putting It All Together: Methods for Combining Neural Networks , 1993, NIPS.

[7]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[8]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[9]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[13]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[14]  Robert Tibshirani,et al.  Bias, Variance and Prediction Error for Classification Rules , 1996 .

[15]  Maurice Milgram,et al.  Constraint tangent distance for on-line character recognition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[16]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[17]  Yoshua Bengio,et al.  Training Methods for Adaptive Boosting of Neural Networks , 1997, NIPS.

[18]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[20]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[21]  Yoshua Bengio,et al.  AdaBoosting Neural Networks: Application to on-line Character Recognition , 1997, ICANN.

[22]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[23]  Peter L. Bartlett,et al.  Direct Optimization of Margins Improves Generalization in Combined Classifiers , 1998, NIPS.

[24]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[25]  L. Breiman Arcing Classifiers , 1998 .

[26]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[27]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[28]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[29]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[30]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .