论文信息 - Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

In this paper, we present a novel approach, Momentum Teacher, for student-teacher based self-supervised learning. The approach performs momentum update on both network weights and batch normalization (BN) statistics. The teacher’s weight is a momentum update of the student, and the teacher’s BN statistics is a momentum update of those in history. The Momentum Teacher is simple and efficient. It can achieve the state of the art results (74.5%) under ImageNet linear evaluation protocol using small-batch size(e.g., 128), without requiring large-batch training on special hardware like TPU or inefficient across GPU operation (e.g., shuffling BN, synced BN). Our implementation and pre-trained models will be given on GitHub1.

[1] Abhinav Shrivastava,et al. EvalNorm: Estimating Batch Normalization Statistics for Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[4] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[5] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Julien Mairal,et al. Unsupervised Pre-Training of Image Features on Non-Curated Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Alexander Kolesnikov,et al. S4L: Self-Supervised Semi-Supervised Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[9] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[10] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Michael James,et al. Online Normalization for Training Neural Networks , 2019, NeurIPS.

[12] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[13] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[16] Trevor Darrell,et al. Learning Features by Watching Objects Move , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[18] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[19] Abhinav Gupta,et al. Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21] Jeff Donahue,et al. Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[22] Yuning Jiang,et al. MegDet: A Large Mini-Batch Object Detector , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[24] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[25] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.

[26] Cordelia Schmid,et al. What makes for good views for contrastive learning , 2020, NeurIPS.

[27] Kaiming He,et al. Group Normalization , 2018, ECCV.

[28] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.

[29] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[30] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.

[31] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[32] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[33] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[34] Razvan Pascanu,et al. BYOL works even without batch statistics , 2020, ArXiv.

[35] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[36] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[37] Chengxu Zhuang,et al. Local Aggregation for Unsupervised Learning of Visual Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Xinlei Chen,et al. Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[40] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[41] Laurens van der Maaten,et al. Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[43] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[44] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[45] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .

[47] Ali Razavi,et al. Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[48] Ross B. Girshick,et al. LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[50] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[51] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Alexei A. Efros,et al. Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[55] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57] Junnan Li,et al. Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[58] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[59] Jian Sun,et al. Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization , 2020, ICLR.