A Note on Connecting Barlow Twins with Negative-Sample-Free Contrastive Learning

Barlow Twins [Zbontar et al., 2021] is a recently proposed self-supervised learning (SSL) method that encourages similar representations between distorted variations (augmented views) of a sample, while minimizing the redundancy within the representation vector. Specifically, compared to the prior state-of-the-art SSL methods, Barlow Twins demonstrates two main properties. On one hand, its algorithm requires no explicit construction of negative sample pairs, and is not sensitive to large training batch sizes, both of which are the characteristics commonly seen in the recent non-contrastive SSL methods (e.g., BYOL [Grill et al., 2020] and SimSiam [Chen and He, 2020]). On the other hand, it avoids the reliance on symmetry-breaking network designs for distorted samples, which had been found to be crucial for these non-contrastive approaches (in order to avoid learning collapsed representations). We note that designing symmetry-breaking networks is not needed for the recent contrastive SSL methods (e.g., SimCLR [Chen et al., 2020])1. A natural question arises: what makes Barlow Twins an outlier among the existing SSL algorithms?

[1]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[3]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[4]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[5]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[6]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[7]  Ruslan Salakhutdinov,et al.  Self-supervised Representation Learning with Relative Predictive Coding , 2021, ICLR.

[8]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[9]  Sergey Levine,et al.  Wasserstein Dependency Measure for Representation Learning , 2019, NeurIPS.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ruslan Salakhutdinov,et al.  Self-supervised Learning from a Multi-view Perspective , 2020, ICLR.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[14]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[15]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).