Toward a Geometrical Understanding of Self-supervised Contrastive Learning

Self-supervised learning (SSL) is currently one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations. Despite their success, the underlying geometry of these representations remains elusive, which obfuscates the quest for more robust, trustworthy, and interpretable models. In particular, mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector. When used for transfer learning, the projector is discarded since empirical results show that its representation gen-eralizes more poorly than the encoder’s. In this paper, we investigate this curious phenomenon and analyze how the strength of the data augmentation policies affects the data embedding. We discover a non-trivial relation between the encoder, the projector, and the data augmentation strength: with increasingly larger augmentation policies, the projector, rather than the encoder, is more strongly driven to become invariant to the augmentations. It does so by eliminating crucial information about the data by learning to project it into a low-dimensional space, a noisy estimate of the data manifold tangent plane in the encoder representation. This analysis is substantiated through a geometrical perspective with theoretical and empirical results. by using the representation extracted in the encoder space . Large, moderate, and small augmentations refer to the strength of the data augmentation applied to the input samples (see Table 2 for each configuration). The smaller the strength of the data augmentation policy, the less the projector suffers from dimensional collapse. However, when the projector is affected by a substantial dimensional collapse, the encoder representation becomes suitable for the downstream task. In this work, we demys-tify this intriguing relationship between augmentation strengths, encoder embedding, and projector geometry.

[1]  Yann LeCun,et al.  Understanding Dimensional Collapse in Contrastive Self-supervised Learning , 2021, ICLR.

[2]  Yann LeCun,et al.  Decoupled Contrastive Learning , 2021, ECCV.

[3]  Yann LeCun,et al.  VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ICLR.

[4]  Guido Mont'ufar,et al.  Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums , 2021, SIAM J. Appl. Algebra Geom..

[5]  Mingyang Yi,et al.  Towards the Generalization of Contrastive Self-Supervised Learning , 2021, ArXiv.

[6]  Jeff Z. HaoChen,et al.  Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss , 2021, NeurIPS.

[7]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Jonathan Tompson,et al.  With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[10]  Yuandong Tian,et al.  Understanding self-supervised Learning Dynamics without Contrastive Pairs , 2021, ICML.

[11]  Feng Wang,et al.  Understanding the Behaviour of Contrastive Loss , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Timothy M. Hospedales,et al.  How Well Do Self-Supervised Models Transfer? , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Richard Baraniuk,et al.  Deep Autoencoders: From Understanding to Generalization Guarantees , 2020, MSML.

[14]  Christopher J. Rozell,et al.  Variational Autoencoder with Learned Latent Structure , 2020, AISTATS.

[15]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[16]  Yusheng Xie,et al.  Towards Good Practices in Self-supervised Representation Learning , 2020, ArXiv.

[17]  Behnaam Aazhang,et al.  Provable Finite Data Generalization with Group Autoencoder , 2020, ArXiv.

[18]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[19]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[20]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[21]  Chen Sun,et al.  What makes for good views for contrastive learning , 2020, NeurIPS.

[22]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[23]  T. Hofmann,et al.  Batch normalization provably avoids ranks collapse for randomly initialised deep networks , 2020, NeurIPS.

[24]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[25]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Quoc V. Le,et al.  RandAugment: Practical data augmentation with no separate search , 2019, ArXiv.

[28]  J. Macke,et al.  Intrinsic dimension of data representations in deep neural networks , 2019, NeurIPS.

[29]  Behnaam Aazhang,et al.  The Geometry of Deep Networks: Power Diagram Subdivision , 2019, NeurIPS.

[30]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[31]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[32]  Richard G. Baraniuk,et al.  A Spline Theory of Deep Learning , 2018, ICML 2018.

[33]  Percy Liang,et al.  Unsupervised Transformation Learning via Convex Relaxations , 2017, NIPS.

[34]  Yang You,et al.  Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[35]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[36]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[37]  Yoshua Bengio,et al.  Non-Local Manifold Tangent Learning , 2004, NIPS.

[38]  B. Hall Lie Groups, Lie Algebras, and Representations: An Elementary Introduction , 2004 .

[39]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.