Directional Self-supervised Learning for Heavy Image Augmentations

Despite the large augmentation family, only a few cherry-picked robust augmentation policies are beneficial to self-supervised image representation learning. In this paper, we propose a directional self-supervised learning paradigm (DSSL), which is compatible with significantly more augmentations. Specifically, we adapt heavy augmentation policies after the views lightly augmented by standard augmentations, to generate harder view (HV). HV usually has a higher deviation from the original image than the lightly augmented standard view (SV). Unlike previous methods equally pairing all augmented views to symmetrically maximize their similarities, DSSL treats augmented views of the same instance as a partially ordered set (with directions as SV↔SV, SV←HV), and then equips a directional objective function respecting to the derived relationships among views. DSSL can be easily implemented with a few lines of codes and is highly flexible to popular selfsupervised learning frameworks, including SimCLR, SimSiam, BYOL. Extensive experimental results on CIFAR and ImageNet demonstrated that DSSL can stably improve various baselines with compatibility to a wider range of augmentations.

[1]  Francesc Moreno-Noguer,et al.  3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning , 2020, ECCV.

[2]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[5]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[6]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[7]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[9]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[10]  Shih-Fu Chang,et al.  Unsupervised Embedding Learning via Invariant and Spreading Instance Feature , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[12]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[13]  Guo-Jun Qi,et al.  Contrastive Learning With Stronger Augmentations , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[18]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Mario A. Nascimento,et al.  UniformAugment: A Search-free Probabilistic Data Augmentation Approach , 2020, ArXiv.

[20]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[21]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[22]  Cordelia Schmid,et al.  What makes for good views for contrastive learning , 2020, NeurIPS.

[23]  Shaogang Gong,et al.  Unsupervised Deep Learning by Neighbourhood Discovery , 2019, ICML.

[24]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[25]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).