Learning from Multiple Corrupted Sources, with Application to Learning from Label Proportions

We study binary classification in the setting where the learner is presented with multiple corrupted training samples, with possibly different sample sizes and degrees of corruption, and introduce an approach based on minimizing a weighted combination of corruption-corrected empirical risks. We establish a generalization error bound, and further show that the bound is optimized when the weights are certain interpretable and intuitive functions of the sample sizes and degrees of corruptions. We then apply this setting to the problem of learning with label proportions (LLP), and propose an algorithm that enjoys the most general statistical performance guarantees known for LLP. Experiments demonstrate the utility of our theory.

[1]  Clayton Scott,et al.  A Generalized Neyman-Pearson Criterion for Optimal Domain Adaptation , 2018, ALT.

[2]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Robert C. Williamson,et al.  A Theory of Learning with Corrupted Labels , 2017, J. Mach. Learn. Res..

[4]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[5]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[7]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[8]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[9]  Shih-Fu Chang,et al.  On Learning with Label Proportions , 2014, ArXiv.

[10]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[11]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[12]  Lucas Beyer,et al.  Deep multi-class learning from label proportions , 2019, ArXiv.

[13]  Jack Edmonds,et al.  Maximum matching and a polyhedron with 0,1-vertices , 1965 .

[14]  Wenxian Yu,et al.  Learning from label proportions for SAR image classification , 2017, EURASIP J. Adv. Signal Process..

[15]  Aditya Krishna Menon,et al.  An Average Classification Algorithm , 2015, ArXiv.

[16]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[17]  Nagarajan Natarajan,et al.  Learning from binary labels with instance-dependent noise , 2018, Machine Learning.

[18]  Patrick T. Komiske,et al.  Learning to classify from impure samples with high-dimensional data , 2018, Physical Review D.

[19]  Yingying Fan,et al.  Classification with imperfect training labels , 2018, Biometrika.

[20]  Ryota Tomioka,et al.  Norm-Based Capacity Control in Neural Networks , 2015, COLT.

[21]  Nagarajan Natarajan,et al.  Cost-Sensitive Learning with Noisy Labels , 2017, J. Mach. Learn. Res..

[22]  Tao Sun,et al.  A Probabilistic Approach for Learning with Label Proportions Applied to the US Presidential Election , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[23]  Masashi Sugiyama,et al.  On Symmetric Losses for Learning from Corrupted Labels , 2019, ICML.

[24]  Stefan R ping SVM Classifier Estimation from Group Probabilities , 2010, ICML 2010.

[25]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[26]  Alexander J. Smola,et al.  Estimating Labels from Label Proportions , 2009, J. Mach. Learn. Res..

[27]  Parameswaran Kamalaruban,et al.  Minimax Lower Bounds for Cost Sensitive Classification , 2018, ArXiv.

[28]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[29]  Richard Nock,et al.  (Almost) No Label No Cry , 2014, NIPS.