Soft Margins for AdaBoost

Recently ensemble methods like ADABOOST have been applied successfully in many problems, while seemingly defying the problems of overfitting.ADABOOST rarely overfits in the low noise regime, however, we show that it clearly does so for higher noise levels. Central to the understanding of this fact is the margin distribution. ADABOOST can be viewed as a constraint gradient descent in an error function with respect to the margin. We find that ADABOOST asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors. A hard margin is clearly a sub-optimal strategy in the noisy case, and regularization, in our case a “mistrust” in the data, must be introduced in the algorithm to alleviate the distortions that single difficult patterns (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original ADABOOST algorithm to achieve a soft margin. In particular we suggest (1) regularized ADABOOSTREG where the gradient decent is done directly with respect to the soft margin and (2) regularized linear and quadratic programming (LP/QP-) ADABOOST, where the soft margin is attained by introducing slack variables.Extensive simulations demonstrate that the proposed regularized ADABOOST-type algorithms are useful and yield competitive results for noisy data.

[1]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[2]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[3]  Scott Kirkpatrick,et al.  Optimization by simulated annealing: Quantitative studies , 1984 .

[4]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[5]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[10]  Harris Drucker,et al.  Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[11]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[12]  Setsuo Arikawa,et al.  Proceedings of the 7th International Workshop on Algorithmic Learning Theory , 1996 .

[13]  J. Ross Quinlan,et al.  Boosting First-Order Learning , 1996, ALT.

[14]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[15]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[16]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[17]  Paola Campadelli,et al.  A Boosting Algorithm for Regression , 1997, ICANN.

[18]  L. Breiman Arcing the edge , 1997 .

[19]  Yoshua Bengio,et al.  AdaBoosting Neural Networks: Application to on-line Character Recognition , 1997, ICANN.

[20]  H. Schwenk,et al.  Adaboosting neural networks , 1997 .

[21]  Gunnar Rätsch,et al.  Using support vector machines for time series prediction , 1999 .

[22]  Gunnar Rätsch,et al.  Regularizing AdaBoost , 1998, NIPS.

[23]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[24]  Jun Rokui,et al.  Improving the Generalization Performance of the Minimum Classification Error Learning and Its Application to Neural Networks , 1998, ICONIP.

[25]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[26]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[27]  L. Breiman Arcing Classifiers , 1998 .

[28]  Bernhard Schölkopf,et al.  The connection between regularization operators and support vector kernels , 1998, Neural Networks.

[29]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[30]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[31]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[32]  Gunnar Rätsch,et al.  An asymptotic analysis of AdaBoost in the binary classification case , 1998 .

[33]  R. Harrison,et al.  Perceptrons in Kernel Feature Spaces , 1998 .

[34]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[35]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[36]  L. Breiman USING ADAPTIVE BAGGING TO DEBIAS REGRESSIONS , 1999 .

[37]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[38]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[39]  Gunnar Rätsch,et al.  Barrier Boosting , 2000, COLT.

[40]  Gunnar Rätsch,et al.  Robust Ensemble Learning , 2000 .

[41]  Gunnar Rätsch,et al.  An asymptotical Analysis and Improvement of AdaBoost in the binary classification case , 2000 .

[42]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[43]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[44]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[45]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[46]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[47]  William H. Press,et al.  Numerical recipes in C , 2002 .

[48]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[49]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[50]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[51]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[52]  Tom,et al.  A simple cost function for boostingMarcus , .