A robust multi-class AdaBoost algorithm for mislabeled noisy data

AdaBoost has been theoretically and empirically proved to be a very successful ensemble learning algorithm, which iteratively generates a set of diverse weak learners and combines their outputs using the weighted majority voting rule as the final decision. However, in some cases, AdaBoost leads to overfitting especially for mislabeled noisy training examples, resulting in both its degraded generalization performance and non-robustness. Recently, a representative approach named noise-detection based AdaBoost (ND_AdaBoost) has been proposed to improve the robustness of AdaBoost in the two-class classification scenario, however, in the multi-class scenario, this approach can hardly achieve satisfactory performance due to the following three reasons. (1) If we decompose a multi-class classification problem using such strategies as one-versus-all or one-versus-one, the obtained two-class problems usually have imbalanced training sets, which negatively influences the performance of ND_AdaBoost. (2) If we directly apply ND_AdaBoost to the multi-class classification scenario, its two-class loss function is no longer applicable and its accuracy requirement for the (weak) base classifiers, i.e., greater than 0.5, is too strong to be almost satisfied. (3) ND_AdaBoost still has the tendency of overfitting as it increases the weights of correctly classified noisy examples, which could make it focus on learning these noisy examples in the subsequent iterations. To solve the dilemma, in this paper, we propose a robust multi-class AdaBoost algorithm (Rob_MulAda) whose key ingredients consist in a noise-detection based multi-class loss function and a new weight updating scheme. Experimental study indicates that our newly-proposed weight updating scheme is indeed more robust to mislabeled noises than that of ND_AdaBoost in both two-class and multi-class scenarios. In addition, through the comparison experiments, we also verify the effectiveness of Rob_MulAda and provide a suggestion in choosing the most appropriate noise-alleviating approach according to the concrete noise level in practical applications.

[1]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Fabrice Muhlenbach,et al.  Identifying and Handling Mislabelled Instances , 2004, Journal of Intelligent Information Systems.

[3]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  José Luis Alba-Castro,et al.  Shedding light on the asymmetric learning capability of AdaBoost , 2012, Pattern Recognit. Lett..

[6]  Ashfaqur Rahman,et al.  Ensemble classifier generation using non-uniform layered clustering and Genetic Algorithm , 2013, Knowl. Based Syst..

[7]  Konstantinos N. Plataniotis,et al.  Ensemble-based discriminant learning with boosting for face recognition , 2006, IEEE Transactions on Neural Networks.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[10]  Xiaobo Jin,et al.  Multi-class AdaBoost with Hypothesis Margin , 2010, 2010 20th International Conference on Pattern Recognition.

[11]  Sam Kwong,et al.  A noise-detection based AdaBoost algorithm for mislabeled data , 2012, Pattern Recognit..

[12]  Raymond J. Mooney,et al.  Creating diversity in ensembles using artificial data , 2005, Inf. Fusion.

[13]  Gonzalo Martínez-Muñoz,et al.  Switching class labels to generate classification ensembles , 2005, Pattern Recognit..

[14]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Feng Gao,et al.  Edited AdaBoost by weighted kNN , 2010, Neurocomputing.

[16]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[17]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[18]  Bo Sun,et al.  An empirical margin explanation for the effectiveness of DECORATE ensemble learning algorithm , 2015, Knowl. Based Syst..

[19]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20]  Ning Lu,et al.  Concept drift detection via competence models , 2014, Artif. Intell..

[21]  S. Lemon,et al.  Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression , 2003, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[22]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[23]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[24]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[27]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007 .

[28]  Rocco A. Servedio,et al.  Smooth boosting and learning with malicious noise , 2003 .

[29]  Francisco Herrera,et al.  Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition , 2012, Knowledge and Information Systems.

[30]  Osamu Watanabe,et al.  MadaBoost: A Modification of AdaBoost , 2000, COLT.

[31]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[32]  Lev V. Utkin,et al.  Robust boosting classification models with local sets of probability distributions , 2014, Knowl. Based Syst..

[33]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[34]  Pietro Perona,et al.  Pruning training sets for learning of object categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.