论文信息 - Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Real-world large-scale datasets are heteroskedastic and imbalanced -- labels have varying levels of uncertainty and label distributions are long-tailed. Heteroskedasticity and imbalance challenge deep learning algorithms due to the difficulty of distinguishing among mislabeled, ambiguous, and rare examples. Addressing heteroskedasticity and imbalance simultaneously is under-explored. We propose a data-dependent regularization technique for heteroskedastic datasets that regularizes different regions of the input space differently. Inspired by the theoretical derivation of the optimal regularization strength in a one-dimensional nonparametric classification setting, our approach adaptively regularizes the data points in higher-uncertainty, lower-density regions more heavily. We test our method on several benchmark tasks, including a real-world heteroskedastic and imbalanced dataset, WebVision. Our experiments corroborate our theory and demonstrate a significant improvement over other methods in noise-robust deep learning.

[1] Clayton D. Scott,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Tengyu Ma,et al. Robust and On-the-fly Dataset Denoising for Image Classification , 2020, ECCV.

[3] Xiu-Shen Wei,et al. BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Tengyu Ma,et al. Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin , 2019, ArXiv.

[5] Robert D. Nowak,et al. Minimum "Norm" Neural Networks are Splines , 2019, ArXiv.

[6] Colin Wei,et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[7] Zhiyuan Li,et al. Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee , 2019, ICLR.

[8] Pengfei Chen,et al. Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels , 2019, ICML.

[9] Colin Wei,et al. Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation , 2019, NeurIPS.

[10] Stella X. Yu,et al. Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Daphna Weinshall,et al. On The Power of Curriculum Learning in Training Deep Networks , 2019, ICML.

[12] Samet Oymak,et al. Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.

[13] Qi Xie,et al. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[14] Nathan Srebro,et al. How do infinite width bounded norm networks look in function space? , 2019, COLT.

[15] Wei Wu,et al. Dynamic Curriculum Learning for Imbalanced Data Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Yang Song,et al. Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Weilin Huang,et al. CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images , 2018, ECCV.

[18] Mert R. Sabuncu,et al. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[19] Xingrui Yu,et al. Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[20] Bin Yang,et al. Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[21] Li Fei-Fei,et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[22] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[23] Kotagiri Ramamohanarao,et al. Learning with Bounded Instance- and Label-dependent Label Noise , 2017, ICML.

[24] G. Sapiro,et al. Robust Large Margin Deep Neural Networks , 2016, IEEE Transactions on Signal Processing.

[25] Wei Li,et al. WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[26] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.

[27] Shai Shalev-Shwartz,et al. Decoupling "when to update" from "how to update" , 2017, NIPS.

[28] Aritra Ghosh,et al. Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[29] Abhinav Gupta,et al. Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[31] Richard Nock,et al. Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Nagarajan Natarajan,et al. Learning from Binary Labels with Instance-Dependent Corruption , 2016, ArXiv.

[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[35] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36] Dacheng Tao,et al. Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] M. Verleysen,et al. Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[38] Jinglai Shen,et al. Smoothing splines with varying smoothing parameter , 2013, 1306.1868.

[39] R. Tibshirani. Adaptive piecewise polynomial estimation via trend filtering , 2013, 1304.2986.

[40] Guang Cheng,et al. Local and global asymptotic inference in smoothing spline models , 2012, 1212.6788.

[41] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[42] Christopher Holmes,et al. Spatially adaptive smoothing splines , 2006 .

[43] Carla E. Brodley,et al. Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[44] Tony R. Martinez,et al. Instance Pruning Techniques , 1997, ICML.

[45] Qi Zhao,et al. Using Qualitative Hypotheses to Identify Inaccurate Data , 1995, J. Artif. Intell. Res..

[46] Yizhou Wang,et al. L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise , 2019, NeurIPS.

[47] Martial Hebert,et al. Learning to Model the Tail , 2017, NIPS.

[48] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[49] David Ruppert,et al. Spatially Adaptive Smoothing , 2003 .