Binary classification with corrupted labels

In a binary classification problem where the goal is to fit an accurate predictor, the presence of corrupted labels in the training data set may create an additional challenge. However, in settings where likelihood maximization is poorly behaved— for example, if positive and negative labels are perfectly separable—then a small fraction of corrupted labels can improve performance by ensuring robustness. In this work, we establish that in such settings, corruption acts as a form of regularization, and we compute precise upper bounds on estimation error in the presence of corruptions. Our results suggest that the presence of corrupted data points is beneficial only up to a small fraction of the total sample, scaling with the square root of the sample size.

[1]  Jelena Bradic,et al.  Learning to Combat Noisy Labels via Classification Margins , 2021, ArXiv.

[2]  Justo Puerto,et al.  A Mathematical Programming approach to Binary Supervised Classification with Label Noise , 2020, ArXiv.

[3]  Ata Kabán,et al.  Label-Noise Robust Logistic Regression and Its Applications , 2012, ECML/PKDD.

[4]  Kotagiri Ramamohanarao,et al.  Learning with Bounded Instance- and Label-dependent Label Noise , 2017, ICML.

[5]  Ata Kabán,et al.  Learning kernel logistic regression in the presence of class label noise , 2014, Pattern Recognition.

[6]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[7]  Nagarajan Natarajan,et al.  Learning from Binary Labels with Instance-Dependent Corruption , 2016, ArXiv.

[8]  Ata Kabán,et al.  Classification with unknown class conditional label noise on non-compact feature spaces , 2019, COLT.

[9]  Robert C. Williamson,et al.  A Theory of Learning with Corrupted Labels , 2017, J. Mach. Learn. Res..

[10]  Benoît Frénay,et al.  A comprehensive introduction to label noise , 2014, ESANN.

[11]  Naresh Manwani,et al.  Noise Tolerance Under Risk Minimization , 2011, IEEE Transactions on Cybernetics.

[12]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[13]  NatarajanNagarajan,et al.  Cost-sensitive learning with noisy labels , 2017 .

[14]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Yingying Fan,et al.  Classification with imperfect training labels , 2018, Biometrika.

[16]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Aritra Ghosh,et al.  Making risk minimization tolerant to label noise , 2014, Neurocomputing.

[18]  Frank Nielsen,et al.  Loss factorization, weakly supervised learning and label noise robustness , 2016, ICML.

[19]  Aryeh Kontorovich,et al.  Concentration in unbounded metric spaces and algorithmic stability , 2013, ICML.

[20]  Ata Kabán,et al.  Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise , 2019, ICML.

[21]  Rocco A. Servedio,et al.  Random classification noise defeats all convex potential boosters , 2008, ICML '08.

[22]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[23]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[24]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.