Filtering mislabeled data for improving time series classification

The supervised classification of optical image time series allow the production of accurate land cover maps over large areas. However, the precision yielded by learning algorithms strongly depends on the quality of the reference data. The reference databases covering a large geographical area usually contain noisy data with an important number of mislabeled instances. These labeling errors result in longer training time, less accurate classifiers, and ultimately poorer results. To address this issue, we proposed a new iterative learning framework that removes mislabeled data from the training set. Specifically, a preliminary outlier rejection procedure based on the well-known Random Forest algorithm is proposed. The presented strategy is tested with the classification of Sentinel-2 image time series acquired on 2016 by using an out-of-date reference dataset collected on 2014.

[1]  Mariana Belgiu,et al.  Random forest in remote sensing: A review of applications and future directions , 2016 .

[2]  Carsten Brockmann,et al.  Automated Training Sample Extraction for Global Land Cover Mapping , 2014, Remote. Sens..

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Mikhail F. Kanevski,et al.  A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification , 2011, IEEE Journal of Selected Topics in Signal Processing.

[5]  Jörn Ostermann,et al.  Automatic Refinement of Training Data for Classification of Satellite Imagery , 2012 .

[6]  Gérard Dedieu,et al.  Assessing the robustness of Random Forests to map land cover with high resolution satellite image time series over large areas , 2016 .

[7]  Joanne C. White,et al.  Optical remotely sensed time series data for land cover classification: A review , 2016 .

[8]  Claire Marais-Sicre,et al.  Effect of Training Class Label Noise on Classification Performances for Land Cover Mapping with Satellite Image Time Series , 2017, Remote. Sens..

[9]  Gérard Dedieu,et al.  A Multi-Temporal and Multi-Spectral Method to Estimate Aerosol Optical Thickness over Land, for the Atmospheric Correction of FormoSat-2, LandSat, VENμS and Sentinel-2 Images , 2015, Remote. Sens..

[10]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[11]  Mogens Humlekrog Greve,et al.  An Ensemble-Based Training Data Refinement for Automatic Crop Discrimination Using WorldView-2 Imagery , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[12]  Ali Mohammadzadeh,et al.  UNCERTAIN TRAINING DATA EDITION FOR AUTOMATIC OBJECT-BASED CHANGE MAP EXTRACTION , 2013 .

[13]  Lucy Bastin,et al.  The Sensitivity of Mapping Methods to Reference Data Quality: Training Supervised Image Classifications with Imperfect Reference Data , 2016, ISPRS Int. J. Geo Inf..

[14]  David Morin,et al.  Operational High Resolution Land Cover Map Production at the Country Scale Using Satellite Image Time Series , 2017, Remote. Sens..

[15]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.