论文信息 - Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics

Cost-Sensitive Self-Training for Optimizing Non-Decomposable Metrics

Self-training based semi-supervised learning algorithms have enabled the learning of highly accurate deep neural networks, using only a fraction of labeled data. However, the majority of work on self-training has focused on the objective of improving accuracy, whereas practical machine learning systems can have complex goals (e.g. maximizing the minimum of recall across classes, etc.) that are non-decomposable in nature. In this work, we introduce the Cost-Sensitive Self-Training (CSST) framework which generalizes the self-training-based methods for optimizing non-decomposable metrics. We prove that our framework can better optimize the desired non-decomposable metric utilizing unlabeled data, under similar data distribution assumptions made for the analysis of self-training. Using the proposed CSST framework, we obtain practical self-training methods (for both vision and NLP tasks) for optimizing different non-decomposable metrics using deep neural networks. Our results demonstrate that CSST achieves an improvement over the state-of-the-art in majority of the cases across datasets and objectives.

[1] T. Shinozaki,et al. FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling , 2021, NeurIPS.

[2] Harikrishna Narasimhan,et al. Implicit rate-constrained optimization of non-decomposable objectives , 2021, ICML.

[3] Harikrishna Narasimhan,et al. Training Over-parameterized Models with Non-decomposable Objectives , 2021, NeurIPS.

[4] A. Yuille,et al. CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Colin Wei,et al. Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data , 2020, ICLR.

[6] Sung Ju Hwang,et al. Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning , 2020, NeurIPS.

[7] Ankit Singh Rawat,et al. Long-tail learning via logit adjustment , 2020, ICLR.

[8] Quoc V. Le,et al. Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[10] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Colin Wei,et al. Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin , 2019, ICLR.

[12] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[13] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[14] Xiaofeng Liu,et al. Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Colin Wei,et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[16] Carlos Guestrin,et al. Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment , 2019, ICML.

[17] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[18] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[19] Mehryar Mohri,et al. Agnostic Federated Learning , 2019, ICML.

[20] Yang Song,et al. Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Maya R. Gupta,et al. Optimization with Non-Differentiable Constraints with Applications to Fairness, Recall, Churn, and Other Goals , 2018, J. Mach. Learn. Res..

[22] André F. T. Martins,et al. Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[23] Fabrizio Sebastiani,et al. Optimizing non-decomposable measures with deep networks , 2018, Machine Learning.

[24] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Tatsuya Harada,et al. Asymmetric Tri-training for Unsupervised Domain Adaptation , 2017, ICML.

[26] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[27] Richard Nock,et al. Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Maya R. Gupta,et al. Satisfying Real-world Goals with Dataset Constraints , 2016, NIPS.

[29] Oluwasanmi Koyejo,et al. Optimal Classification with Multivariate Losses , 2016, ICML.

[30] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[31] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Harikrishna Narasimhan,et al. Consistent Multiclass Algorithms for Complex Performance Measures , 2015, ICML.

[34] Harikrishna Narasimhan,et al. On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures , 2014, NIPS.

[35] Yves Grandvalet,et al. Optimizing F-Measures by Cost-Sensitive Classification , 2014, NIPS.

[36] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[37] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[38] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[39] John Langford,et al. Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[40] Marco Chiani,et al. New exponential bounds and approximations for the computation of error probability in fading channels , 2003, IEEE Trans. Wirel. Commun..

[41] Yi Lin,et al. Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[42] Harikrishna Narasimhan,et al. Consistent Plug-in Classifiers for Complex Objectives and Constraints , 2020, NeurIPS.

[43] Jens Lehmann,et al. DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[44] Ji-Rong Wen,et al. Semi-Supervised Learning , 2014 .

[45] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[46] Zien,et al. Semi-Supervised Learning , 2009 .

[47] S. Bobkov. An isoperimetric inequality on the discrete cube, and an elementary proof of the isoperimetric inequality in Gauss space , 1997 .