Self-supervised Learning is More Robust to Dataset Imbalance

Self-supervised learning (SSL) is a scalable way to learn general visual representations since it learns without labels. However, large-scale unlabeled datasets in the wild often have long-tailed label distributions, where we know little about the behavior of SSL. In this work, we systematically investigate self-supervised learning under dataset imbalance. First, we find out via extensive experiments that off-the-shelf selfsupervised representations are already more robust to class imbalance than supervised representations. The performance gap between balanced and imbalanced pre-training with SSL is significantly smaller than the gap with supervised learning, across sample sizes, for both in-domain and, especially, out-ofdomain evaluation. Second, towards understanding the robustness of SSL, we hypothesize that SSL learns richer features from frequent data: it may learn label-irrelevant-but-transferable features that help classify the rare classes and downstream tasks. In contrast, supervised learning has no incentive to learn features irrelevant to the labels from frequent examples. We validate this hypothesis with semi-synthetic experiments and theoretical analyses on a simplified setting. Third, inspired by the theoretical insights, we devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalanced datasets with the same number of examples.

[1]  Matus Telgarsky,et al.  Gradient descent aligns the layers of deep linear networks , 2018, ICLR.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Ankit Singh Rawat,et al.  Long-tail learning via logit adjustment , 2020, ICLR.

[4]  Marcus Rohrbach,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.

[5]  Shin Ando,et al.  Deep Over-sampling Framework for Classifying Imbalanced Data , 2017, ECML/PKDD.

[6]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[7]  Songyang Zhang,et al.  Distribution Alignment: A Unified Framework for Long-tail Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[9]  Junwei Lu,et al.  Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization , 2020, ArXiv.

[10]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[12]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[13]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Colin Wei,et al.  Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning , 2021, ArXiv.

[15]  Abhinav Gupta,et al.  Transitive Invariance for Self-Supervised Visual Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[17]  Yen-Cheng Liu,et al.  Posterior Re-calibration for Imbalanced Datasets , 2020, NeurIPS.

[18]  Ming-Hsuan Yang,et al.  Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yi Jiang,et al.  Learning to Segment the Tail , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Chen Huang,et al.  Deep Imbalanced Learning for Face Recognition and Attribute Prediction , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Zachary C. Lipton,et al.  What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.

[22]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Qi Xie,et al.  Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting , 2019, NeurIPS.

[24]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[25]  Seungju Han,et al.  Disentangling Label Distribution for Long-tailed Visual Recognition , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  W. Reed The Pareto, Zipf and other power laws , 2001 .

[27]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[28]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[30]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[31]  Colin Wei,et al.  Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel , 2018, NeurIPS.

[32]  Jiashi Feng,et al.  The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation , 2020, ECCV.

[33]  Akshay Krishnamurthy,et al.  Contrastive learning, multi-view redundancy, and linear models , 2020, ALT.

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  J. Lee,et al.  Predicting What You Already Know Helps: Provable Self-Supervised Learning , 2020, NeurIPS.

[36]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[37]  Chuchu Han,et al.  Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yuzhe Yang,et al.  Rethinking the Value of Labels for Improving Class-Imbalanced Learning , 2020, NeurIPS.

[39]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Eunho Yang,et al.  Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks , 2019, ICLR.

[41]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.

[42]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[43]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[45]  Hao Wang,et al.  Delving into Deep Imbalanced Regression , 2021, ICML.

[46]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[47]  Yuandong Tian,et al.  Understanding self-supervised Learning Dynamics without Contrastive Pairs , 2021, ICML.

[48]  Zhongqi Miao,et al.  Long-tailed Recognition by Routing Diverse Distribution-Aware Experts , 2021, ICLR.

[49]  Yuting Ye,et al.  Understanding the role of importance weighting for deep learning , 2021, ICLR.

[50]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[51]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.