Unifying multi-class AdaBoost algorithms with binary base learners under the margin framework

Multi-class AdaBoost algorithms AdaBooost.MO, -ECC and -OC have received a great attention in the literature, but their relationships have not been fully examined to date. In this paper, we present a novel interpretation of the three algorithms, by showing that MO and ECC perform stage-wise functional gradient descent on a cost function defined over margin values, and that OC is a shrinkage version of ECC. This allows us to strictly explain the properties of ECC and OC, empirically observed in prior work. Also, the outlined interpretation leads us to introduce shrinkage as regularization in MO and ECC, and thus to derive two new algorithms: SMO and SECC. Experiments on diverse databases are performed. The results demonstrate the effectiveness of the proposed algorithms and validate our theoretical findings.

[1]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[4]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[5]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[6]  Jian Li,et al.  Reducing the Overfitting of Adaboost by Controlling its Data Distribution Skewness , 2006, Int. J. Pattern Recognit. Artif. Intell..

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  Dale Schuurmans,et al.  Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.

[9]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[12]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[13]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[14]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[16]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[17]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[18]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[19]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[20]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .