An Entropic Optimal Transport Loss for Learning Deep Neural Networks under Label Noise in Remote Sensing Images

Deep neural networks have established as a powerful tool for large scale supervised classification tasks. The state-of-the-art performances of deep neural networks are conditioned to the availability of large number of accurately labeled samples. In practice, collecting large scale accurately labeled datasets is a challenging and tedious task in most scenarios of remote sensing image analysis, thus cheap surrogate procedures are employed to label the dataset. Training deep neural networks on such datasets with inaccurate labels easily overfits to the noisy training labels and degrades the performance of the classification tasks drastically. To mitigate this effect, we propose an original solution with entropic optimal transportation. It allows to learn in an end-to-end fashion deep neural networks that are, to some extent, robust to inaccurately labeled samples. We empirically demonstrate on several remote sensing datasets, where both scene and pixel-based hyperspectral images are considered for classification. Our method proves to be highly tolerant to significant amounts of label noise and achieves favorable results against state-of-the-art methods.

[1]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[2]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transportation , 2013, NIPS 2013.

[3]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[4]  Nicolas Courty,et al.  Large Scale Optimal Transport and Mapping Estimation , 2017, ICLR.

[5]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[6]  Lei Guo,et al.  When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Gabriel Peyré,et al.  Sinkhorn-AutoDiff: Tractable Wasserstein Learning of Generative Models , 2017 .

[9]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[10]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[11]  Ray J. Hickey,et al.  Artificial Intelligence Noise modelling and evaluating learning from examples , 2003 .

[12]  P. Rigollet,et al.  Entropic optimal transport is maximum-likelihood deconvolution , 2018, Comptes Rendus Mathematique.

[13]  Jordan M. Malof,et al.  Large-Scale Semantic Classification: Outcome of the First Year of Inria Aerial Image Labeling Benchmark , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[14]  Yongjun Zhang,et al.  Large-Scale Remote Sensing Image Retrieval by Deep Hashing Neural Networks , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Thomas Hofmann,et al.  Learning Aerial Image Segmentation From Online Maps , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[16]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[17]  Yansheng Li,et al.  Multiple Feature Hashing Learning for Large-Scale Remote Sensing Image Retrieval , 2017, ISPRS Int. J. Geo Inf..

[18]  Gérard Dedieu,et al.  Filtering mislabeled data for improving time series classification , 2017, 2017 9th International Workshop on the Analysis of Multitemporal Remote Sensing Images (MultiTemp).

[19]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[20]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[21]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[22]  Deliang Fan,et al.  A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23]  Ronald Kemker,et al.  High-Resolution Multispectral Dataset for Semantic Segmentation , 2017, ArXiv.

[24]  Devis Tuia,et al.  Detecting Mammals in UAV Images: Best Practices to address a substantially Imbalanced Dataset with Deep Learning , 2018, Remote Sensing of Environment.

[25]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[26]  Nicolas Courty,et al.  DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation , 2018, ECCV.

[27]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[28]  Min Bai,et al.  TorontoCity: Seeing the World with a Million Eyes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[30]  Claire Marais-Sicre,et al.  Effect of Training Class Label Noise on Classification Performances for Land Cover Mapping with Satellite Image Time Series , 2017, Remote. Sens..

[31]  Alessandro Rudi,et al.  Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance , 2018, NeurIPS.

[32]  John A. Richards,et al.  Remote Sensing Digital Image Analysis: An Introduction , 1999 .

[33]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[34]  Pierre Alliez,et al.  High-Resolution Aerial Image Labeling With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[35]  Christian Heipke,et al.  USING LABEL NOISE ROBUST LOGISTIC REGRESSION FOR AUTOMATED UPDATING OF TOPOGRAPHIC GEOSPATIAL DATABASES , 2016 .

[36]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[37]  Zhiming Luo,et al.  Spectral–Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[38]  Christian Heipke,et al.  A label noise tolerant random forest for the classification of remote sensing data based on outdated maps for training , 2019, Comput. Vis. Image Underst..

[39]  Christian Heipke,et al.  Classification Under Label Noise Based on Outdated Maps , 2017 .

[40]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Nicolas Courty,et al.  Learning Wasserstein Embeddings , 2017, ICLR.

[42]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[43]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[44]  C. Villani Topics in Optimal Transportation , 2003 .

[45]  C. Villani Optimal Transport: Old and New , 2008 .

[46]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[48]  Geoffrey E. Hinton,et al.  Learning to Label Aerial Images from Noisy Data , 2012, ICML.

[49]  Zhenfeng Shao,et al.  PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[50]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[51]  Fahad Shahbaz Khan,et al.  Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification , 2017, ArXiv.

[52]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[53]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[54]  N. Papadakis Optimal Transport for Image Processing , 2015 .

[55]  Arash Vahdat,et al.  Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[56]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Bertrand Le Saux,et al.  Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[58]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[60]  Nicolas Courty,et al.  Joint distribution optimal transportation for domain adaptation , 2017, NIPS.

[61]  Jared Frank,et al.  Effect of Label Noise on the Machine-Learned Classification of Earthquake Damage , 2017, Remote. Sens..

[62]  Kim-Kwang Raymond Choo,et al.  Spectral–spatial multi-feature-based deep learning for hyperspectral remote sensing image classification , 2016, Soft Computing.

[63]  Francisco Herrera,et al.  Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition , 2012, Knowledge and Information Systems.