Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels

In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions. Nevertheless, UREs lead to overfitting in many problem settings when the models are complex like deep networks. In this paper, we investigate reasons for such overfitting by studying a weakly supervised problem called learning with complementary labels. We argue the quality of gradient estimation matters more in risk minimization. Theoretically, we show that a URE gives an unbiased gradient estimator(UGE). Practically, however, UGEs may suffer from huge variance, which causes empirical gradients to be usually far away from true gradients during minimization. To this end, we propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance and makes empirical gradients more aligned with true gradients in the direction. Thanks to this characteristic, SCL successfully mitigates the overfitting issue and improves URE-based methods.

[1]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[2]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[3]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[4]  Gang Niu,et al.  Classification from Pairwise Similarity and Unlabeled Data , 2018, ICML.

[5]  Junmo Kim,et al.  NLNL: Negative Learning for Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Gang Niu,et al.  Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data , 2016, ICML.

[7]  Gang Niu,et al.  Semi-supervised AUC optimization based on positive-unlabeled learning , 2017, Machine Learning.

[8]  Yitian Xu,et al.  Multi-Complementary and Unlabeled Learning for Arbitrary Losses and Models , 2020, Pattern Recognit..

[9]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[10]  Gang Niu,et al.  Complementary-Label Learning for Arbitrary Losses and Models , 2018, ICML.

[11]  Gang Niu,et al.  Binary Classification from Positive-Confidence Data , 2017, NeurIPS.

[12]  Dacheng Tao,et al.  Learning with Biased Complementary Labels , 2017, ECCV.

[13]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[14]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[15]  J. Zico Kolter,et al.  Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.

[16]  Kun Zhang,et al.  Generative-Discriminative Complementary Learning , 2019, AAAI.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Vladimir Vapnik,et al.  Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[20]  Gang Niu,et al.  Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[21]  Gang Niu,et al.  On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data , 2018, ICLR.

[22]  Masashi Sugiyama,et al.  Online Multiclass Classification Based on Prediction Margin for Partial Feedback , 2019, ArXiv.

[23]  Gang Niu,et al.  Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach , 2020, AISTATS.

[24]  Gang Niu,et al.  Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning , 2016, NIPS.

[25]  Bo An,et al.  Learning from Multiple Complementary Labels , 2020, ICML.

[26]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[28]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[29]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[30]  Gang Niu,et al.  Learning from Complementary Labels , 2017, NIPS.

[31]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[33]  Gang Niu,et al.  Classification from Positive, Unlabeled and Biased Negative Data , 2018, ICML.

[34]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[35]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.