论文信息 - Faster Meta Update Strategy for Noise-Robust Deep Learning

Faster Meta Update Strategy for Noise-Robust Deep Learning

It has been shown that deep neural networks are prone to overfitting on biased training data. Towards addressing this issue, meta-learning employs a meta model for correcting the training bias. Despite the promising performances, super slow training is currently the bottleneck in the meta learning approaches. In this paper, we introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient computation with a faster layer-wise approximation. We empirically find that FaMUS yields not only a reasonably accurate but also a low-variance approximation of the meta gradient. We conduct extensive experiments to verify the proposed method on two tasks. We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance. In particular, our method achieves the state-of-the-art performance on both synthetic and realistic noisy labels, and obtains promising performance on long-tailed recognition on standard benchmarks. Code are released at https://github.com/youjiangxu/FaMUS.

[1] Xingrui Yu,et al. How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[2] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Seetha Hari,et al. Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[4] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[5] Colin Wei,et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[6] Xingrui Yu,et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[7] Thomas Voice,et al. Learning Soft Labels via Meta Learning , 2020, ArXiv.

[8] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[9] Yang Song,et al. Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Lu Jiang,et al. SimAug: Learning Robust Representations from Simulation for Trajectory Prediction , 2020, ECCV.

[11] Isaac L. Chuang,et al. Confident Learning: Estimating Uncertainty in Dataset Labels , 2019, J. Artif. Intell. Res..

[12] Deyu Meng,et al. Meta Transition Adaptation for Robust Deep Learning with Noisy Labels , 2020, ArXiv.

[13] Kiyoharu Aizawa,et al. Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] Xiaogang Wang,et al. Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Shiliang Zhang,et al. Pose-Driven Deep Convolutional Model for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Shiguang Shan,et al. Self-Paced Curriculum Learning , 2015, AAAI.

[17] Honglak Lee,et al. Distilling Effective Supervision From Severe Label Noise , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Gang Niu,et al. Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[19] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] Hui Han,et al. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[21] Yi Yang,et al. Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jacob Eisenstein,et al. AdvAug: Robust Adversarial Augmentation for Neural Machine Translation , 2020, ACL.

[23] Pengfei Chen,et al. Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels , 2019, ICML.

[24] Mohan S. Kankanhalli,et al. Learning to Learn From Noisy Labeled Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Stella X. Yu,et al. Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jacob Goldberger,et al. Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[28] Qi Xie,et al. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[29] Qingming Huang,et al. Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.

[30] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[31] Alexander D'Amour,et al. Reducing Reparameterization Gradient Variance , 2017, NIPS.

[32] Kun Yi,et al. Probabilistic End-To-End Noise Correction for Learning With Noisy Labels , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[34] Li Fei-Fei,et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[35] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.

[36] Robert C. Holte,et al. C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Noel E. O'Connor,et al. Unsupervised label noise modeling and loss correction , 2019, ICML.

[39] Kilian Q. Weinberger,et al. Identifying Mislabeled Data using the Area Under the Margin Ranking , 2020, NeurIPS.

[40] Arash Vahdat,et al. Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[41] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[42] Richard Nock,et al. Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Xiang Yu,et al. Feature Transfer Learning for Face Recognition With Under-Represented Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] James Bailey,et al. Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[45] Alan L. Yuille,et al. Snapshot Distillation: Teacher-Student Optimization in One Generation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[47] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[48] Wei Li,et al. WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[49] Qinghua Hu,et al. Training Noise-Robust Deep Neural Networks via Meta-Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Shai Shalev-Shwartz,et al. Decoupling "when to update" from "how to update" , 2017, NIPS.

[51] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52] Junnan Li,et al. DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[53] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Mohammed Bennamoun,et al. Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[55] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[56] Xiu-Shen Wei,et al. BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[58] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[59] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[60] Weilong Yang,et al. Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels , 2019, ICML.

[61] Bin Yang,et al. Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[62] Bo An,et al. Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Ming-Hsuan Yang,et al. Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Kevin Gimpel,et al. Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.