Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks

In recent years, visual recognition on challenging long-tailed distributions, where classes often exhibit extremely imbalanced frequencies, has made great progress mostly based on various complex paradigms (e.g., meta learning). Apart from these complex methods, simple refinements on training procedure also make contributions. These refinements, also called tricks, are minor but effective, such as adjustments in the data distribution or loss functions. However, different tricks might conflict with each other. If users apply these long-tail related tricks inappropriately, it could cause worse recognition accuracy than expected. Unfortunately, there has not been a scientific guideline of these tricks in the literature. In this paper, we first collect existing tricks in long-tailed visual recognition and then perform extensive and systematic experiments, in order to give a detailed experimental guideline and obtain an effective combination of these tricks. Furthermore, we also propose a novel data augmentation approach based on class activation maps for long-tailed recognition, which can be friendly combined with re-sampling methods and shows excellent results. By assembling these tricks scientifically, we can outperform state-of-the-art methods on four long-tailed benchmark datasets, including ImageNet-LT and iNaturalist 2018. Our code is open-source and available at https://github.com/zhangyongshun/BagofTricks-LT.

[1]  Mei Wang,et al.  Fair Loss: Margin-Aware Reinforcement Learning for Deep Face Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Ioannis A. Kakadiaris,et al.  Deep Imbalanced Attribute Classification using Visual Attention Aggregation , 2018, ECCV.

[3]  Ioannis Mitliagkas,et al.  Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[4]  Chris. Drummond,et al.  C 4 . 5 , Class Imbalance , and Cost Sensitivity : Why Under-Sampling beats OverSampling , 2003 .

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Michael K. Ng,et al.  Oversampling for Imbalanced Data via Optimal Transport , 2019, AAAI.

[7]  Guiguang Ding,et al.  Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification , 2020, ECCV.

[8]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[9]  Xiaogang Wang,et al.  Factors in Finetuning Deep Model for Object Detection with Long-Tail Distribution , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[14]  Yu Li,et al.  Classification Calibration for Long-tail Instance Segmentation , 2019, ArXiv.

[15]  Zhi Zhang,et al.  Bag of Freebies for Training Object Detection Neural Networks , 2019, ArXiv.

[16]  Yuanjie Zheng,et al.  Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification , 2019, AAAI.

[17]  María Pérez-Ortiz,et al.  Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets , 2019, AAAI.

[18]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[19]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Lina Yao,et al.  One-Shot Learning for Long-Tail Visual Relation Detection , 2020, AAAI.

[22]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[23]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Weihong Deng,et al.  Unequal-Training for Deep Face Recognition With Long-Tailed Noisy Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[27]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[28]  Wai Lam,et al.  Data Augmentation Based on Adversarial Autoencoder Handling Imbalance for Learning to Rank , 2019, AAAI.

[29]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[30]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Ajinkya More,et al.  Survey of resampling techniques for improving classification performance in unbalanced datasets , 2016, ArXiv.

[34]  Xuanjing Huang,et al.  Trainable Undersampling for Class-Imbalance Learning , 2019, AAAI.

[35]  Qi Tian,et al.  Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data , 2019, ArXiv.

[36]  Xiu-Shen Wei,et al.  BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiu-Shen Wei,et al.  Deep Learning for Fine-Grained Image Analysis: A Survey , 2019, ArXiv.

[38]  Haibin Ling,et al.  Feature Space Augmentation for Long-Tailed Data , 2020, ECCV.

[39]  Xiao Zhang,et al.  Range Loss for Deep Face Recognition with Long-Tailed Training Data , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Ming-Hsuan Yang,et al.  Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[42]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Xiu-Shen Wei,et al.  Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[45]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Qingming Huang,et al.  Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.

[48]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[49]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[50]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[51]  Marcus Rohrbach,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.