论文信息 - Big Self-Supervised Models are Strong Semi-Supervised Learners

Big Self-Supervised Models are Strong Semi-Supervised Learners

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to most previous approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of a big (deep and wide) network during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2 (a modification of SimCLR), supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9\% ImageNet top-1 accuracy with just 1\% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10\times$ improvement in label efficiency over the previous state-of-the-art. With 10\% of labels, ResNet-50 trained with our method achieves 77.5\% top-1 accuracy, outperforming standard supervised training with all of the labels.

[1] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Xiaohua Zhai,et al. Self-Supervised GANs via Auxiliary Rotation Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.

[5] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7] Xiaojin Zhu,et al. Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[8] Jeff Donahue,et al. Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[9] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[10] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Colin Raffel,et al. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[12] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.

[13] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[14] Yoshua Bengio,et al. Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[15] Yang Yang,et al. Deep Learning Scaling is Predictable, Empirically , 2017, ArXiv.

[16] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.

[17] Dong-Hyun Lee,et al. Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[18] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[19] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[20] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[21] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[22] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23] Quoc V. Le,et al. Semi-supervised Sequence Learning , 2015, NIPS.

[24] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[25] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[26] Alexander Kolesnikov,et al. S4L: Self-Supervised Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27] 知秀柴田. 5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[28] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[29] Quoc V. Le,et al. AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Tim Salimans,et al. Milking CowMask for Semi-Supervised Image Classification , 2020, ArXiv.

[31] Alexander Kolesnikov,et al. Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Kan Chen,et al. Billion-scale semi-supervised learning for image classification , 2019, ArXiv.

[33] Junnan Li,et al. Prototypical Contrastive Learning of Unsupervised Representations , 2020, ArXiv.

[34] Philip Bachman,et al. Learning with Pseudo-Ensembles , 2014, NIPS.

[35] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[39] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[40] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[41] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[42] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[43] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Abhinav Gupta,et al. Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[46] Ali Razavi,et al. Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[47] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[48] Geoffrey E. Hinton,et al. Self-organizing neural network that discovers surfaces in random-dot stereograms , 1992, Nature.

[49] Yoshua Bengio,et al. Interpolation Consistency Training for Semi-Supervised Learning , 2019, IJCAI.

[50] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[51] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[52] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[53] Jian Yang,et al. Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[55] Quoc V. Le,et al. Meta Pseudo Labels , 2020, ArXiv.

[56] Laurens van der Maaten,et al. Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Rich Caruana,et al. Model compression , 2006, KDD '06.

[58] Xiaojin Zhu,et al. Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[59] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[60] Xiaofeng Liu,et al. Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61] Quoc V. Le,et al. Unsupervised Data Augmentation , 2019, ArXiv.

[62] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[63] Tolga Tasdizen,et al. Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning , 2016, NIPS.

[64] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[65] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[66] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[67] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[68] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.

[69] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[70] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[71] Chen Sun,et al. What makes for good views for contrastive learning , 2020, NeurIPS.

[72] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.