Class-Balanced Distillation for Long-Tailed Visual Recognition

Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to longtailed distributions. An effective and simple approach to longtailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively. In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a longtailed setting. Our main contribution is a new training method, referred to as Class-Balanced Distillation (CBD), that leverages knowledge distillation to enhance feature representations. CBD allows the feature representation to evolve in the second training stage, guided by the teacher learned in the first stage. The second stage uses class-balanced sampling, in order to focus on under-represented classes. This framework can naturally accommodate the usage of multiple teachers, unlocking the information from an ensemble of models to enhance recognition capabilities. Our experiments show that the proposed technique consistently outperforms the state of the art on long-tailed recognition benchmarks such as ImageNet-LT, iNaturalist17 and iNaturalist18. The experiments also show that our method does not sacrifice the accuracy of head classes to improve the performance of tail classes, unlike most existing work.

[1]  Jinwoo Shin,et al.  M2m: Imbalanced Classification via Major-to-Minor Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.

[3]  Ming-Hsuan Yang,et al.  Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition From a Domain Adaptation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Xiu-Shen Wei,et al.  BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Cordelia Schmid,et al.  Diversity With Cooperation: Ensemble Methods for Few-Shot Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[7]  Tieniu Tan,et al.  Large-Scale Object Detection in the Wild From Imbalanced Multi-Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sheng Tang,et al.  Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[10]  Hongsheng Li,et al.  Balanced Meta-Softmax for Long-Tailed Visual Recognition , 2020, NeurIPS.

[11]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[12]  Dahua Lin,et al.  Learning a Unified Classifier Incrementally via Rebalancing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Haibin Ling,et al.  Feature Space Augmentation for Long-Tailed Data , 2020, ECCV.

[14]  Xiao Zhang,et al.  Range Loss for Deep Face Recognition with Long-Tailed Training Data , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yu Wang,et al.  Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets , 2020, ECCV.

[17]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[18]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Trevor Darrell,et al.  Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Weihong Deng,et al.  Unequal-Training for Deep Face Recognition With Long-Tailed Noisy Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jiashi Feng,et al.  The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation , 2020, ECCV.

[23]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[24]  Yannis Avrithis,et al.  Asymmetric metric learning for knowledge transfer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Matthijs Douze,et al.  Low-Shot Learning with Large-Scale Diffusion , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Yi Yang,et al.  Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[30]  Yannis Avrithis,et al.  Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Xiang Yu,et al.  Feature Transfer Learning for Face Recognition With Under-Represented Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jianfeng Zhan,et al.  Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks , 2017, ICANN.

[34]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[35]  Bingyi Cao,et al.  Unifying Deep Local and Global Features for Image Search , 2020, ECCV.

[36]  Tobias Weyand,et al.  Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yannis Avrithis,et al.  Label Propagation for Deep Semi-Supervised Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Ankit Singh Rawat,et al.  Long-tail learning via logit adjustment , 2020, ICLR.

[39]  Marcus Rohrbach,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.

[40]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[41]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Grant Van Horn,et al.  The iNaturalist Species Classification and Detection Dataset-Supplementary Material , 2018 .

[44]  Dong Cao,et al.  Domain Balancing: Face Recognition on Long-Tailed Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[46]  Junmo Kim,et al.  Less-forgetful Learning for Domain Expansion in Deep Neural Networks , 2017, AAAI.

[47]  Guiguang Ding,et al.  Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification , 2020, ECCV.

[48]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[49]  Junjie Yan,et al.  Equalization Loss for Long-Tailed Object Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[51]  Yi Jiang,et al.  Learning to Segment the Tail , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Barbara Caputo,et al.  DeepNCM: Deep Nearest Class Mean Classifiers , 2018, ICLR.

[53]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[54]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[55]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Ling Shao,et al.  Striking the Right Balance With Uncertainty , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Jimeng Sun,et al.  ELF: An Early-Exiting Framework for Long-Tailed Classification , 2020, ArXiv.

[59]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[60]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Chuchu Han,et al.  Deep Representation Learning on Long-Tailed Data: A Learnable Embedding Augmentation Perspective , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Yuzhe Yang,et al.  Rethinking the Value of Labels for Improving Class-Imbalanced Learning , 2020, NeurIPS.

[63]  Cordelia Schmid,et al.  Memory-Efficient Incremental Learning Through Feature Adaptation , 2020, ECCV.

[64]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Yang Song,et al.  Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Nuno Vasconcelos,et al.  Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier , 2020, ECCV.

[67]  Tony X. Han,et al.  Learning Efficient Object Detection Models with Knowledge Distillation , 2017, NIPS.