论文信息 - Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

Contrastive self-supervised learning (CSL) has attracted increasing attention for model pre-training via unlabeled data. The resulted CSL models provide instancediscriminative visual features that are uniformly scattered in the feature space. During deployment, the common practice is to directly fine-tune CSL models with cross-entropy, which however may not be the best strategy in practice. Although cross-entropy tends to separate inter-class features, the resulting models still have limited capability for reducing intra-class feature scattering that exists in CSL models. In this paper, we investigate whether applying contrastive learning to fine-tuning would bring further benefits, and analytically find that optimizing the contrastive loss benefits both discriminative representation learning and model optimization during fine-tuning. Inspired by these findings, we propose Contrast-regularized tuning (Core-tuning), a new approach for fine-tuning CSL models. Instead of simply adding the contrastive loss to the objective of fine-tuning, Core-tuning further applies a novel hard pair mining strategy for more effective contrastive fine-tuning, as well as smoothing the decision boundary to better exploit the learned discriminative feature space. Extensive experiments on image classification and semantic segmentation verify the effectiveness of Core-tuning.

[1] Gang Niu,et al. Geometry-aware Instance-reweighted Adversarial Training , 2021, ICLR.

[2] Ce Liu,et al. Supervised Contrastive Learning , 2020, NeurIPS.

[3] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[5] Zhun Deng,et al. How Does Mixup Help With Robustness and Generalization? , 2020, ArXiv.

[6] Beliz Gunel,et al. Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning , 2020, ICLR.

[7] Geoffrey E. Hinton,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[8] Daniel C. Castro,et al. Domain Generalization via Model-Agnostic Learning of Semantic Features , 2019, NeurIPS.

[9] Shuicheng Yan,et al. Deep Long-Tailed Learning: A Survey , 2021, ArXiv.

[10] Yang Song,et al. The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11] Abhinav Gupta,et al. ClusterFit: Improving Generalization of Visual Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Xuhong Li,et al. Explicit Inductive Bias for Transfer Learning with Convolutional Networks , 2018, ICML.

[13] Meihong Wang,et al. Information Theoretical Clustering via Semidefinite Programming , 2011, AISTATS.

[14] Gregory Shakhnarovich,et al. Learning Representations for Automatic Colorization , 2016, ECCV.

[15] Gustavo Carneiro,et al. Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Qingyao Wu,et al. From Whole Slide Imaging to Microscopy: Deep Microscopy Adaptation Network for Histopathology Cancer Image Classification , 2019, MICCAI.

[17] Meng Yang,et al. Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[18] Pablo Piantanida,et al. A Unifying Mutual Information View of Metric Learning: Cross-Entropy vs. Pairwise Losses , 2020, ECCV.

[19] Mingsheng Long,et al. Bi-tuning of Pre-trained Representations , 2020, ArXiv.

[20] Haoyi Xiong,et al. DELTA: DEep Learning Transfer using Feature Map with Attention for Convolutional Networks , 2019, ICLR.

[21] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[22] Zhenguo Li,et al. How Well Self-Supervised Pre-Training Performs with Streaming Data? , 2021, ArXiv.

[23] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[25] Yongxin Yang,et al. Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[27] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29] Sethuraman Panchanathan,et al. Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Bryan Hooi,et al. Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision , 2021, ArXiv.

[31] Ioannis Mitliagkas,et al. Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[32] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[33] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[34] Laurens van der Maaten,et al. Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Alexander J. Smola,et al. Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Ye Xu,et al. Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[37] Yifan Zhang,et al. COVID-DA: Deep Domain Adaptation from Typical Pneumonia to COVID-19 , 2020, ArXiv.

[38] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Mike Wu,et al. Conditional Negative Sampling for Contrastive Learning of Visual Representations , 2020, ICLR.

[40] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41] Huaping Liu,et al. Understanding the Behaviour of Contrastive Loss , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[43] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[44] Xinyang Chen,et al. Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning , 2019, NeurIPS.

[45] Cyrus Rashtchian,et al. A Closer Look at Accuracy vs. Robustness , 2020, NeurIPS.

[46] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[47] Yi Yang,et al. Contrastive Adaptation Network for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[49] Trevor Darrell,et al. Rethinking Image Mixture for Unsupervised Visual Representation Learning , 2020, ArXiv.

[50] Abhishek Sinha,et al. Charting the Right Manifold: Manifold Mixup for Few-shot Learning , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[51] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[52] David Lopez-Paz,et al. In Search of Lost Domain Generalization , 2020, ICLR.

[53] Kibok Lee,et al. i-Mix: A Strategy for Regularizing Contrastive Representation Learning , 2020, ArXiv.

[54] Stephen Lin,et al. What makes instance discrimination good for transfer learning? , 2020, ICLR.

[55] Yannis Kalantidis,et al. Hard Negative Mixing for Contrastive Learning , 2020, NeurIPS.

[56] Qingyao Wu,et al. Online Adaptive Asymmetric Active Learning for Budgeted Imbalanced Data , 2018, KDD.

[57] Haoyi Xiong,et al. RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr , 2020, ICML.

[58] Joan Bruna,et al. Intriguing properties of neural networks , 2013, ICLR.

[59] Min Wu,et al. Adaptive Cost-Sensitive Online Classification , 2018, IEEE Transactions on Knowledge and Data Engineering.

[60] Xiaoou Tang,et al. Mix-and-Match Tuning for Self-Supervised Semantic Segmentation , 2017, AAAI.

[61] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[62] Cordelia Schmid,et al. What makes for good views for contrastive learning , 2020, NeurIPS.

[63] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[64] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[65] Timothy M. Hospedales,et al. How Well Do Self-Supervised Models Transfer? , 2020, ArXiv.

[66] Aleksander Madry,et al. Robustness May Be at Odds with Accuracy , 2018, ICLR.

[67] Gihun Lee,et al. MixCo: Mix-up Contrastive Learning for Visual Representation , 2020, ArXiv.

[68] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[70] Yifan Zhang,et al. Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis , 2019, IEEE Transactions on Image Processing.

[71] Tao Kong,et al. Dense Contrastive Learning for Self-Supervised Visual Pre-Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[73] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[74] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[75] Phillip Isola,et al. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[76] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78] Jonathan Krause,et al. Collecting a Large-scale Dataset of Fine-grained Cars , 2013 .

[79] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.