Identification and correction of mislabeled training data for land cover classification based on ensemble margin

In remote sensing, where training data are typically ground-based, mislabeled training data is inevitable. This work handles the mislabeling problem by exploiting the ensemble margin for identifying, then eliminating or correcting the mislabeled training data. The effectiveness of our class noise removal and correction methods is demonstrated in performing mapping of land covers. A comparative analysis is conducted with respect to the majority vote filter, a reference ensemble-based class noise filter.

[1]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[2]  Alessandro Sperduti,et al.  A re-weighting strategy for improving margins , 2002, Artif. Intell..

[3]  Samia Boukir,et al.  Margin-based ordered aggregation for ensemble pruning , 2013, Pattern Recognit. Lett..

[4]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[5]  Robert Sabourin,et al.  An empirical study on diversity measures and margin theory for ensembles of classifiers , 2007, 2007 10th International Conference on Information Fusion.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Samia Boukir,et al.  Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin , 2015 .

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Samia Boukir,et al.  Classification of remote sensing data using margin-based ensemble methods , 2013, 2013 IEEE International Conference on Image Processing.

[10]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[11]  Samia Boukir,et al.  Using ensemble margin to explore issues of training data imbalance and mislabeling on large area land cover classification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[12]  Mario Chica-Olmo,et al.  An assessment of the effectiveness of a random forest classifier for land-cover classification , 2012 .

[13]  Samia Boukir,et al.  Ensemble margin framework for image classification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[14]  Nicholas C. Coops,et al.  Aerial Photography: A Rapidly Evolving Tool for Ecological Management , 2010 .

[15]  Choh-Man Teng,et al.  Correcting Noisy Data , 1999, ICML.

[16]  Samia Boukir,et al.  Support Vectors Selection for Supervised Learning Using an Ensemble Approach , 2010, 2010 20th International Conference on Pattern Recognition.