论文信息 - Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

In this paper we revisit the idea of pseudo-labeling in the context of semi-supervised learning where a learning algorithm has access to a small set of labeled samples and a large set of unlabeled samples. Pseudo-labeling works by applying pseudo-labels to samples in the unlabeled set by using a model trained on combination of the labeled samples and any previously pseudo-labeled samples, and iteratively repeating this process in a self-training cycle. Current methods seem to have abandoned this approach in favor of consistency regularization methods that train models under a combination of different styles of self-supervised losses on the unlabeled samples and standard supervised losses on the labeled samples. We empirically demonstrate that pseudo-labeling can in fact be competitive with the state-of-the-art, while being more resilient to out-of-distribution samples in the unlabeled set. We identify two key factors that allow pseudo-labeling to achieve such remarkable results (1) applying curriculum learning principles and (2) avoiding concept drift by restarting model parameters before each self-training cycle. We obtain 94.91% accuracy on CIFAR-10 using only 4,000 labeled samples, and 68.87% top-1 accuracy on Imagenet-ILSVRC using only 10% of the labeled samples.

Vicente Ordonez | Yanjun Qi | Fuwen Tan | Paola Cascante-Bonilla

[1] David Berthelot,et al. ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring , 2020, ICLR.

[2] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[3] Sugato Basu,et al. Semi-Supervised Learning , 2019, Encyclopedia of Database Systems.

[4] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5] Noel E. O'Connor,et al. Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).

[6] Taghi M. Khoshgoftaar,et al. A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[7] Alexander Kolesnikov,et al. S4L: Self-Supervised Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[9] Taesup Kim,et al. Fast AutoAugment , 2019, NeurIPS.

[10] Quoc V. Le,et al. Unsupervised Data Augmentation , 2019, ArXiv.

[11] Yannis Avrithis,et al. Label Propagation for Deep Semi-Supervised Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Daphna Weinshall,et al. On The Power of Curriculum Learning in Training Deep Networks , 2019, ICML.

[13] Yoshua Bengio,et al. Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[14] Jacob Jackson,et al. Semi-Supervised Learning by Label Gradient Alignment , 2019, ArXiv.

[15] Nanning Zheng,et al. Transductive Semi-Supervised Deep Learning Using Min-Max Features , 2018, ECCV.

[16] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[17] Andrew Gordon Wilson,et al. Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[18] Colin Raffel,et al. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[19] Luis Perez,et al. The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[20] Bo Zhang,et al. Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[22] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] José Bento,et al. Generative Adversarial Active Learning , 2017, ArXiv.

[25] Antti Tarvainen,et al. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, NIPS.

[26] Graham W. Taylor,et al. Dataset Augmentation in Feature Space , 2017, ICLR.

[27] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[28] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[29] Tolga Tasdizen,et al. Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[30] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Tapani Raiko,et al. Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[33] Terrance E. Boult,et al. The Extreme Value Machine , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Martin A. Riedmiller,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[35] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[36] Anderson Rocha,et al. Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] David A. Clifton,et al. Novelty Detection with Multivariate Extreme Value Statistics , 2011, J. Signal Process. Syst..

[38] Rama Chellappa,et al. Adaptive Threshold Estimation via Extreme Value Theory , 2010, IEEE Transactions on Signal Processing.

[39] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[41] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[42] David A. Clifton,et al. Automated Novelty Detection in Industrial Systems , 2008, Advances of Computational Intelligence in Industrial Systems.

[43] Venu Govindaraju,et al. Modeling biometric systems using the general pareto distribution (GPD) , 2008, SPIE Defense + Commercial Sensing.

[44] Patrice Y. Simard,et al. Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[45] Rayid Ghani,et al. Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[46] ASHOK K. AGRAWALA,et al. Learning with a probabilistic teacher , 1970, IEEE Trans. Inf. Theory.

[47] H. J. Scudder,et al. Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[48] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.

[49] Dong-Hyun Lee,et al. Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[50] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[51] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[52] Alexander Zien,et al. Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[53] G. McLachlan. Iterative Reclassification Procedure for Constructing An Asymptotically Optimal Rule of Allocation in Discriminant-Analysis , 1975 .

[54] Stanley C. Fralick,et al. Learning to recognize patterns without a teacher , 1967, IEEE Trans. Inf. Theory.