A H OLISTIC V IEW OF L ABEL N OISE T RANSITION M A - TRIX IN D EEP L EARNING AND B EYOND

In this paper, we explore learning statistically consistent classifiers under label noise by estimating the noise transition matrix (T ). We first provide a holistic view of existing T -estimation methods including those with or without anchor point assumptions. We unified them into the Minimum Geometric Envelope Operator (MGEO) framework, which tries to find the smallest T (in terms of a certain metric) that elicits a convex hull to enclose the posteriors of all the training data. Although MGEO methods show appealing theoretical properties and empirical results, we find them prone to failing when the noisy posterior estimation is imperfect, which is inevitable in practice. Specifically, we show that MGEO methods are in-consistent even with infinite samples if the noisy posterior is not estimated accurately. In view of this, we make the first effort to address this issue by proposing a novel T -estimation framework via the lens of bilevel optimization, and term it RObust Bilevel OpTimzation (ROBOT). ROBOT paves a new road beyond MGEO framework, which enjoys strong theoretical properties: identifibility, consistency and finite-sample generalization guarantees. Notably, ROBOT neither requires the perfect posterior estimation nor assumes the existence of anchor points. We further theoretically demonstrate that ROBOT is more robust in the case where MGEO methods fail. Experimentally, our framework also shows superior performance across multiple benchmarks. Our code is released at https://github.com/pipilurj/ROBOT †.

[1]  Weizhong Zhang,et al.  Model Agnostic Sample Reweighting for Out-of-Distribution Learning , 2023, ICML.

[2]  T. Zhang,et al.  Probabilistic Bilevel Coreset Selection , 2023, ICML.

[3]  Weizhong Zhang,et al.  DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Hao Wang,et al.  Bayesian Invariant Risk Minimization , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alexei A. Efros,et al.  Dataset Distillation by Matching Training Trajectories , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shengyu Zhu,et al.  ZIN: When and How to Learn Invariance Without Environment Partition? , 2022, NeurIPS.

[7]  Tongliang Liu,et al.  Selective-Supervised Contrastive Learning with Noisy Labels , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Paul Vicol,et al.  Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies , 2021, ICML.

[9]  Tongliang Liu,et al.  Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations , 2021, ICLR.

[10]  Xiaozhe Ren,et al.  AutoBERT-Zero: Evolving BERT Backbone from Scratch , 2021, AAAI.

[11]  Junjun Jiang,et al.  Asymmetric Loss Functions for Learning with Noisy Labels , 2021, ICML.

[12]  Zhenguo Li,et al.  Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Rodolphe Jenatton,et al.  Correlated Input-Dependent Label Noise in Large-Scale Image Classification , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Junmo Kim,et al.  Joint Negative and Positive Learning for Noisy Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Qi Wu,et al.  Jo-SRC: A Contrastive Approach for Combating Noisy Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  James T. Kwok,et al.  SparseBERT: Rethinking the Importance Analysis in Self-attention , 2021, ICML.

[17]  Masashi Sugiyama,et al.  Provably End-to-end Label-Noise Learning without Anchor Points , 2021, ICML.

[18]  Gang Niu,et al.  Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization , 2021, ICML.

[19]  Yang Liu,et al.  A Second-Order Approach to Learning with Instance-Dependent Label Noise , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Nannan Wang,et al.  Extended $T$T: Learning With Mixed Closed-Set and Open-Set Noisy Labels , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  J. Kwok,et al.  Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS , 2020, NeurIPS.

[22]  Shafiq R. Joty,et al.  UNISON: Unpaired Cross-Lingual Image Captioning , 2020, AAAI.

[23]  Sheng Liu,et al.  Early-Learning Regularization Prevents Memorization of Noisy Labels , 2020, NeurIPS.

[24]  James Bailey,et al.  Normalized Loss Functions for Deep Learning with Noisy Labels , 2020, ICML.

[25]  Mingming Gong,et al.  Class2Simi: A Noise Reduction Perspective on Learning with Noisy Labels , 2020, ICML.

[26]  Gang Niu,et al.  Parts-dependent Label Noise: Towards Instance-dependent Label Noise , 2020, ArXiv.

[27]  Gang Niu,et al.  Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning , 2020, NeurIPS.

[28]  Hakan Bilen,et al.  Dataset Condensation with Gradient Matching , 2020, ICLR.

[29]  David A. Clifton,et al.  ProSelfLC: Progressive Self Label Correction for Training Robust Deep Neural Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Maya R. Gupta,et al.  Deep k-NN for Noisy Labels , 2020, ICML.

[31]  Lei Feng,et al.  Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[33]  Kilian Q. Weinberger,et al.  Identifying Mislabeled Data using the Area Under the Margin Ranking , 2020, NeurIPS.

[34]  David Duvenaud,et al.  Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.

[35]  Yang Liu,et al.  Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates , 2019, ICML.

[36]  Thomas Brox,et al.  SELF: Learning to Filter Noisy Labels with Self-Ensembling , 2019, ICLR.

[37]  Binqiang Zhao,et al.  O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Gang Niu,et al.  Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[40]  D. Duvenaud,et al.  Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions , 2019, ICLR.

[41]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[42]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[43]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[44]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[45]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[46]  J. Schulman,et al.  Reptile: a Scalable Metalearning Algorithm , 2018 .

[47]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[48]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[49]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[50]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[51]  Kalyanmoy Deb,et al.  A Review on Bilevel Optimization: From Classical to Evolutionary Approaches and Applications , 2017, IEEE Transactions on Evolutionary Computation.

[52]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[53]  Paolo Frasconi,et al.  Forward and Reverse Gradient-Based Hyperparameter Optimization , 2017, ICML.

[54]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[55]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Fabian Pedregosa,et al.  Hyperparameter optimization with approximate gradient , 2016, ICML.

[57]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Clayton Scott,et al.  A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels , 2015, AISTATS.

[59]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[60]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[62]  Hwai-Chung Ho,et al.  On the asymptotic joint distribution of the sum and maximum of stationary normal random variables , 1996, Journal of Applied Probability.

[63]  T. Zhang,et al.  Sparse Invariant Risk Minimization , 2022, ICML.

[64]  Tongliang Liu,et al.  Estimating Noise Transition Matrix with Label Correlations for Noisy Multi-Label Learning , 2022, NeurIPS.

[65]  Chen Gong,et al.  Robust early-learning: Hindering the memorization of noisy labels , 2021, ICLR.

[66]  Yizhou Wang,et al.  L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise , 2019, NeurIPS.

[67]  Ah Chung Tsoi,et al.  Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results , 1998, Neural Networks.

[68]  L. M. M.-T. Theory of Probability , 1929, Nature.