Class noise removal and correction for image classification using ensemble margin

Mislabeled training data is a challenge to face in order to build a robust classifier whether it is an ensemble or not. This work handles the mislabeling problem by exploiting four different ensemble margins for identifying, then eliminating or correcting the mislabeled training data. Our approach is based on class noise ordering and relies on the margin values of misclassified data. The effectiveness of our ordering-based class noise removal and correction methods is demonstrated in performing image classification. A comparative analysis is conducted with respect to the majority vote filter, a reference ensemble-based class noise filter.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Qinghua Hu,et al.  Exploiting diversity for optimizing margin distribution in ensemble learning , 2014, Knowl. Based Syst..

[3]  Anneleen Van Assche,et al.  Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.

[4]  Taghi M. Khoshgoftaar,et al.  Enhancing software quality estimation using ensemble-classifier based noise filtering , 2005, Intell. Data Anal..

[5]  Samia Boukir,et al.  Ensemble margin framework for image classification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[6]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[9]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[10]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[11]  Samia Boukir,et al.  Classification of remote sensing data using margin-based ensemble methods , 2013, 2013 IEEE International Conference on Image Processing.

[12]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Samia Boukir,et al.  Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin , 2015 .

[15]  Robert Sabourin,et al.  An empirical study on diversity measures and margin theory for ensembles of classifiers , 2007, 2007 10th International Conference on Information Fusion.

[16]  Samia Boukir,et al.  Using ensemble margin to explore issues of training data imbalance and mislabeling on large area land cover classification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Choh-Man Teng,et al.  Correcting Noisy Data , 1999, ICML.

[19]  Nada Lavrac,et al.  Ensemble-based noise detection: noise ranking and visual performance evaluation , 2012, Data Mining and Knowledge Discovery.

[20]  Samia Boukir,et al.  Support Vectors Selection for Supervised Learning Using an Ensemble Approach , 2010, 2010 20th International Conference on Pattern Recognition.

[21]  Saso Dzeroski,et al.  Noise detection and elimination in data preprocessing: Experiments in medical domains , 2000, Appl. Artif. Intell..

[22]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[23]  Samia Boukir,et al.  Margin-based ordered aggregation for ensemble pruning , 2013, Pattern Recognit. Lett..