Multi-class AdaBoost ∗

Boosting has been a very successful technique for solving the two-class classification problem. In going from two-class to multi-class classification, most algorithms have been restricted to reducing the multi-class classification problem to multiple two-class problems. In this paper, we propose a new algorithm that naturally extends the original AdaBoost algorithm to the multiclass case without reducing it to multiple two-class problems. Similar to AdaBoost in the twoclass case, this new algorithm combines weak classifiers and only requires the performance of each weak classifier be better than random guessing (rather than 1/2). We further provide a statistical justification for the new algorithm using a novel multi-class exponential loss function and forward stage-wise additive modeling. As shown in the paper, the new algorithm is extremely easy to implement and is highly competitive with the best currently available multi-class classification methods.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[3]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[4]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[5]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[6]  Louis ten Bosch,et al.  Speaker normalization for automatic speech recognition — An on-line approach , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[9]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[10]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  B. Yu,et al.  Boosting with the L 2-loss regression and classification , 2001 .

[13]  Yi Lin A note on margin-based loss functions in classification , 2004 .

[14]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[15]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[16]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[17]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  H. Zou The Margin Vector , Admissible Loss and Multi-class Margin-based Classifiers , 2005 .

[22]  Yufeng Liu,et al.  Multicategory ψ-Learning , 2006 .

[23]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..