What shapes the loss landscape of self-supervised learning?

Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL). However, questions remain in our theoretical understanding: When do those collapses occur? What are the mechanisms and causes? We answer these questions by deriving and thoroughly analyzing an analytically tractable theory of SSL loss landscapes. In this theory, we identify the causes of the dimensional collapse and study the effect of normalization and bias. Finally, we leverage the interpretability afforded by the analytical theory to understand how dimensional collapse can be beneficial and what affects the robustness of SSL against data imbalance.

[1]  Liu Ziyin,et al.  Exact Phase Transitions in Deep Learning , 2022, ArXiv.

[2]  Anirvan M. Sengupta,et al.  Toward a Geometrical Understanding of Self-supervised Contrastive Learning , 2022, ArXiv.

[3]  Zihao Wang,et al.  Posterior Collapse of a Linear Latent Variable Model , 2022, NeurIPS.

[4]  Jinjin Tian,et al.  Contrasting the landscape of contrastive and non-contrastive learning , 2022, AISTATS.

[5]  Qi Zhang,et al.  Chaos is a Ladder: A New Theoretical Understanding of Contrastive Learning via Augmentation Overlap , 2022, ICLR.

[6]  Dipendra Kumar Misra,et al.  Understanding Contrastive Learning Requires Incorporating Inductive Biases , 2022, ICML.

[7]  Liu Ziyin,et al.  Exact Solutions of a Deep Linear Network , 2022, NeurIPS.

[8]  Yann LeCun,et al.  Understanding Dimensional Collapse in Contrastive Self-supervised Learning , 2021, ICLR.

[9]  Jeff Z. HaoChen,et al.  Self-supervised Learning is More Robust to Dataset Imbalance , 2021, ICLR.

[10]  James B. Simon,et al.  SGD with a Constant Large Learning Rate Can Converge to Local Maxima , 2021, 2107.11774.

[11]  Suvrit Sra,et al.  Can contrastive learning avoid shortcut solutions? , 2021, NeurIPS.

[12]  Akshay Krishnamurthy,et al.  Investigating the Role of Negatives in Contrastive Representation Learning , 2021, AISTATS.

[13]  Jeff Z. HaoChen,et al.  Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss , 2021, NeurIPS.

[14]  Luigi Gresele,et al.  Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style , 2021, NeurIPS.

[15]  Yann LeCun,et al.  VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning , 2021, ICLR.

[16]  Yue Wang,et al.  On Feature Decorrelation in Self-Supervised Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Ruslan Salakhutdinov,et al.  Self-supervised Representation Learning with Relative Predictive Coding , 2021, ICLR.

[19]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[20]  Roland S. Zimmermann,et al.  Contrastive Learning Inverts the Data Generating Process , 2021, ICML.

[21]  Issei Sato,et al.  Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning , 2021, NeurIPS.

[22]  Yuandong Tian,et al.  Understanding self-supervised Learning Dynamics without Contrastive Pairs , 2021, ICML.

[23]  Feng Wang,et al.  Understanding the Behaviour of Contrastive Loss , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[26]  Charles Blundell,et al.  Representation Learning via Invariant Causal Mechanisms , 2020, ICLR.

[27]  Colin Wei,et al.  Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data , 2020, ICLR.

[28]  Akshay Krishnamurthy,et al.  Contrastive learning, multi-view redundancy, and linear models , 2020, ALT.

[29]  David L. Donoho,et al.  Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.

[30]  Nicu Sebe,et al.  Whitening for Self-Supervised Representation Learning , 2020, ICML.

[31]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[32]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[33]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[34]  Ruslan Salakhutdinov,et al.  Self-supervised Learning from a Multi-view Perspective , 2020, ICLR.

[35]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[36]  Chen Sun,et al.  What makes for good views for contrastive learning , 2020, NeurIPS.

[37]  Akshay Krishnamurthy,et al.  Contrastive estimation reveals topic posterior information to linear models , 2020, J. Mach. Learn. Res..

[38]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[39]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Mohammad Norouzi,et al.  Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse , 2019, NeurIPS.

[41]  Tzu-Ming Harry Hsu,et al.  Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.

[42]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[43]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[44]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[45]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[46]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[47]  Yuandong Tian Deep Contrastive Learning is Provably (almost) Principal Component Analysis , 2022, ArXiv.

[48]  Danai Koutra,et al.  Analyzing Data-Centric Properties for Contrastive Learning on Graphs , 2022, ArXiv.

[49]  James B. Simon,et al.  SGD Can Converge to Local Maxima , 2022, ICLR.

[50]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .