Derivative Manipulation for General Example Weighting.

We propose derivative manipulation (DM) for training accurate and robust softmax-based deep neural networks, for two reasons: (1) In gradient-based optimisation, manipulating the derivative directly is more straightforward than designing loss functions, and it has a direct impact on the update of a model. (2) A loss function's derivative magnitude function can be understood as a weighting scheme; the loss's derivative of an example defines how much impact it has on the update of a model. Therefore, manipulating the derivative is to adjust the weighting scheme. DM simply modifies the derivative magnitude, including transformation and normalisation, after which the derivative magnitude function is termed emphasis density function (EDF). An EDF is a formula expressing an example weighting scheme and we may deduce many options for EDFs from common probability density functions (PDFs). We demonstrate the effectiveness of the DM formulation empirically by extensive experiments on both vision and language tasks, especially when adverse conditions exist, e.g., noisy data and sample imbalance.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Deyu Meng,et al.  What Objective Does Self-paced Learning Indeed Optimize? , 2015, ArXiv.

[4]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[5]  Junmo Kim,et al.  NLNL: Negative Learning for Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Xiaogang Wang,et al.  Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[9]  Masashi Sugiyama,et al.  On Symmetric Losses for Learning from Corrupted Labels , 2019, ICML.

[10]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[14]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yu Liu,et al.  Quality Aware Network for Set to Set Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[17]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[19]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[20]  Nir Shavit,et al.  Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Qi Tian,et al.  MARS: A Video Benchmark for Large-Scale Person Re-Identification , 2016, ECCV.

[23]  Yang Hua,et al.  Deep Metric Learning by Online Soft Mining and Class-Aware Attention , 2018, AAAI.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[26]  Noel E. O'Connor,et al.  Unsupervised label noise modeling and loss correction , 2019, ICML.

[27]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[28]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[29]  Kibok Lee,et al.  Robust Inference via Generative Classifiers for Handling Noisy Labels , 2019, ICML.

[30]  David A. Clifton,et al.  ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yueting Zhuang,et al.  Self-Paced Boost Learning for Classification , 2016, IJCAI.

[32]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[33]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[34]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[37]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[38]  Xiaogang Wang,et al.  Video Person Re-identification with Competitive Snippet-Similarity Aggregation and Co-attentive Snippet Embedding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[40]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[41]  Siddharth Gopal,et al.  Adaptive Sampling for SGD by Exploiting Side Information , 2016, ICML.

[42]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[43]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[44]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[45]  Marc'Aurelio Ranzato,et al.  Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews , 2014, ICLR.

[46]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[47]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[49]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[50]  Yang Hua,et al.  IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters , 2019, 1903.12141.