RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank

Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL's core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non informative loss values to that, it becomes difficult to deploy SSL on a new dataset for which no labels can help to judge the quality of the learned representation. In this study, we develop a simple unsupervised criterion that is indicative of the quality of the learned JE-SSL representations: their effective rank. Albeit simple and computationally friendly, this method -- coined RankMe -- allows one to assess the performance of JE-SSL representations, even on different downstream datasets, without requiring any labels. A further benefit of RankMe is that it does not have any training or hyper-parameters to tune. Through thorough empirical experiments involving hundreds of training episodes, we demonstrate how RankMe can be used for hyperparameter selection with nearly no reduction in final performance compared to the current selection method that involve a dataset's labels. We hope that RankMe will facilitate the deployment of JE-SSL towards domains that do not have the opportunity to rely on labels for representations' quality assessment.

[1]  Alexei A. Efros,et al.  Understanding Collapse in Non-contrastive Siamese Representation Learning , 2022, ECCV.

[2]  Yann LeCun,et al.  On the duality between contrastive and non-contrastive self-supervised learning , 2022, ArXiv.

[3]  Yann LeCun,et al.  Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods , 2022, NeurIPS.

[4]  Teck Khim Ng,et al.  Mugs: A Multi-Granular Self-Supervised Learning Framework , 2022, ArXiv.

[5]  Abhinav Shrivastava,et al.  One Network Doesn't Rule Them All: Moving Beyond Handcrafted Architectures in Self-Supervised Learning , 2022, ArXiv.

[6]  Yann LeCun,et al.  Neural Manifold Clustering and Embedding , 2022, ArXiv.

[7]  Lars Buesing,et al.  Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet? , 2022, ArXiv.

[8]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yann LeCun,et al.  Understanding Dimensional Collapse in Contrastive Self-supervised Learning , 2021, ICLR.

[11]  Yann LeCun,et al.  Decoupled Contrastive Learning , 2021, ECCV.

[12]  Chunyuan Li,et al.  Efficient Self-supervised Vision Transformers for Representation Learning , 2021, ICLR.

[13]  Yann LeCun,et al.  VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ICLR.

[14]  Bobby He,et al.  Exploring the Gap between Collapsed & Whitened Features in Self-Supervised Learning , 2022, ICML.

[15]  Pascal Vincent,et al.  High Fidelity Visualization of What Your Self-Supervised Representation Knows About , 2021, Trans. Mach. Learn. Res..

[16]  Tao Kong,et al.  iBOT: Image BERT Pre-Training with Online Tokenizer , 2021, ArXiv.

[17]  John Canny,et al.  Compressive Visual Representations , 2021, NeurIPS.

[18]  Jeff Z. HaoChen,et al.  Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss , 2021, NeurIPS.

[19]  Yue Wang,et al.  On Feature Decorrelation in Self-Supervised Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Saining Xie,et al.  An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[23]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Trevor Darrell,et al.  SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Nicu Sebe,et al.  Whitening for Self-Supervised Representation Learning , 2020, ICML.

[26]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[27]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[28]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[29]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[30]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Chengxu Zhuang,et al.  Local Aggregation for Unsupervised Learning of Visual Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Gary Bécigneul,et al.  Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities , 2019, ICML.

[34]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[35]  Andreas Dengel,et al.  EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[36]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[37]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[39]  Andrea Vedaldi,et al.  Deep Image Prior , 2017, International Journal of Computer Vision.

[40]  Grant Van Horn,et al.  The iNaturalist Species Classification and Detection Dataset-Supplementary Material , 2018 .

[41]  Yang You,et al.  Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[42]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[45]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[46]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[47]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[50]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[51]  Martin Vetterli,et al.  The effective rank: A measure of effective dimensionality , 2007, 2007 15th European Signal Processing Conference.

[52]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[53]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[54]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..