Generalized Data Weighting via Class-level Gradient Manipulation

Label noise and class imbalance are two major issues coexisting in real-world datasets. To alleviate the two issues, state-of-the-art methods reweight each instance by leveraging a small amount of clean and unbiased data. Yet, these methods overlook class-level information within each instance, which can be further utilized to improve performance. To this end, in this paper, we propose Generalized Data Weighting (GDW) to simultaneously mitigate label noise and class imbalance by manipulating gradients at the class level. To be specific, GDW unrolls the loss gradient to class-level gradients by the chain rule and reweights the flow of each gradient separately. In this way, GDW achieves remarkable performance improvement on both issues. Aside from the performance gain, GDW efficiently obtains class-level weights without introducing any extra computational cost compared with instance weighting methods. Specifically, GDW performs a gradient descent step on class-level weights, which only relies on intermediate gradients. Extensive experiments in various settings verify the effectiveness of GDW. For example, GDW outperforms state-of-the-art methods by 2.56% under the 60% uniform noise setting in CIFAR10. Our code is available at https://github.com/GGchen1997/GDW-NIPS2021.

[1]  Harshita Patel,et al.  A review on classification of imbalanced data for wireless sensor networks , 2020, Int. J. Distributed Sens. Networks.

[2]  Xinyi Wang,et al.  Optimizing Data Usage via Differentiable Rewards , 2019, ICML.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Marcus Rohrbach,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.

[5]  To all authors , 1995 .

[6]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[7]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[8]  Pietro Perona,et al.  The Devil is in the Tails: Fine-grained Classification in the Wild , 2017, ArXiv.

[9]  Tom M. Mitchell,et al.  Learning Data Manipulation for Augmentation and Weighting , 2019, NeurIPS.

[10]  Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training , 2021 .

[11]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[12]  Shaogang Gong,et al.  Class Rectification Hard Mining for Imbalanced Deep Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Mitchell P. Marcus,et al.  Long-Tail Distributions and Unsupervised Learning of Morphology , 2012, COLING.

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Jae-Gil Lee,et al.  Learning from Noisy Labels with Deep Neural Networks: A Survey , 2020, ArXiv.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Thomas Voice,et al.  Learning Soft Labels via Meta Learning , 2020, ArXiv.

[18]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[21]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[22]  Taghi M. Khoshgoftaar,et al.  Survey on deep learning with class imbalance , 2019, J. Big Data.

[23]  P. N. Suganthan,et al.  An approach for classification of highly imbalanced data using weighting and undersampling , 2010, Amino Acids.

[24]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[25]  Dennis DeCoste,et al.  Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum , 2019, NeurIPS.

[26]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[27]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[28]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[29]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[30]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Görkem Algan,et al.  Label Noise Types and Their Effects on Deep Learning , 2020, ArXiv.

[33]  Dongwoo Kim,et al.  Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator , 2019, ArXiv.

[34]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[35]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Mohan S. Kankanhalli,et al.  Learning to Learn From Noisy Labeled Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Arash Vahdat,et al.  Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[40]  Julien Provost,et al.  A Systematic Survey on Sensor Failure Detection and Fault-Tolerance in Ambient Assisted Living , 2018, Sensors.

[41]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[44]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[45]  Saptarshi Sinha,et al.  Class-Wise Difficulty-Balanced Loss for Solving Class-Imbalance , 2020, ACCV.

[46]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Zhongqi Miao,et al.  Long-tailed Recognition by Routing Diverse Distribution-Aware Experts , 2021, ICLR.

[48]  Ilkay Ulusoy,et al.  Meta Soft Label Generation for Noisy Labels , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[49]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[50]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[51]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[52]  Florentino Fernández Riverola,et al.  Assessing the Impact of Class-Imbalanced Data for Classifying Relevant/Irrelevant Medline Documents , 2011, PACBB.

[53]  Honglak Lee,et al.  Distilling Effective Supervision From Severe Label Noise , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).