Pushing the right boundaries matters! Wasserstein Adversarial Training for Label Noise

Noisy labels often occur in vision datasets, especially when they are issued from crowdsourcing or Web scraping. In this paper, we propose a new regularization method which enables one to learn robust classifiers in presence of noisy data. To achieve this goal, we augment the virtual adversarial loss with a Wasserstein distance. This distance allows us to take into account specific relations between classes by leveraging on the geometric properties of this optimal transport distance. Notably, we encode the class similarities in the ground cost that is used to compute the Wasserstein distance. As a consequence, we can promote smoothness between classes that are very dissimilar, while keeping the classification decision function sufficiently complex for similar classes. While designing this ground cost can be left as a problem-specific modeling task, we show in this paper that using the semantic relations between classes names already leads to good results.Our proposed Wasserstein Adversarial Training (WAT) outperforms state of the art on four datasets corrupted with noisy labels: three classical benchmarks and one real case in remote sensing image semantic segmentation.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Ramesh Raskar,et al.  Pairwise Confusion for Fine-Grained Visual Classification , 2017, ECCV.

[4]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[5]  Lei Zhang,et al.  CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[7]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[8]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[9]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Nicolas Courty,et al.  Large Scale Optimal Transport and Mapping Estimation , 2017, ICLR.

[13]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[14]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[15]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[16]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Emmanuel Bengio,et al.  Attack and defence in cellular decision-making: lessons from machine learning. , 2018, 1807.04270.

[18]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[21]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[22]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[23]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[24]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[25]  Cheng Soon Ong,et al.  Learning from Corrupted Binary Labels via Class-Probability Estimation , 2015, ICML.

[26]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[27]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[28]  Rob Fergus,et al.  Learning from Noisy Labels with Deep Neural Networks , 2014, ICLR.

[29]  Gustavo Camps-Valls,et al.  Structured output SVM for remote sensing image classification , 2009 .

[30]  Matthew S. Nokleby,et al.  Learning Deep Networks from Noisy Labels with Dropout Regularization , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[31]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[32]  Nicolas Courty,et al.  DeepJDOT: Deep Joint distribution optimal transport for unsupervised domain adaptation , 2018, ECCV.

[33]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[36]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[37]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[38]  Mark Sandler,et al.  The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Tagging , 2017, IEEE Transactions on Emerging Topics in Computational Intelligence.

[39]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[40]  Jonathan Krause,et al.  The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.

[41]  Fei Wang,et al.  The Devil of Face Recognition is in the Noise , 2018, ECCV.

[42]  Nuno Vasconcelos,et al.  On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost , 2008, NIPS.

[43]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[44]  Arash Vahdat,et al.  Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[45]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[46]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[48]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[49]  Alessandro Rudi,et al.  Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance , 2018, NeurIPS.

[50]  G. Golub,et al.  Eigenvalue computation in the 20th century , 2000 .