论文信息 - Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

Training on datasets with long-tailed distributions has been challenging for major recognition tasks such as classification and detection. To deal with this challenge, image resampling is typically introduced as a simple but effective approach. However, we observe that long-tailed detection differs from classification since multiple classes may be present in one image. As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level. We address object-level resampling by introducing an object-centric memory replay strategy based on dynamic, episodic memory banks. Our proposed strategy has two benefits: 1) convenient object-level resampling without significant extra computation, and 2) implicit feature-level augmentation from model updates. We show that image-level and object-level resamplings are both important, and thus unify them with a joint resampling strategy (RIO). Our method outperforms state-of-the-art long-tailed detection and segmentation methods on LVIS v0.5 across various backbones. Code is available at https://github.com/NVlabs/RIO.

[1] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[2] Ross B. Girshick,et al. LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Kai Chen,et al. Seesaw Loss for Long-Tailed Instance Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Marcus Rohrbach,et al. Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.

[5] Martial Hebert,et al. Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[6] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[7] Yoshua Bengio,et al. MetaGAN: An Adversarial Approach to Few-Shot Learning , 2018, NeurIPS.

[8] Yu-Chiang Frank Wang,et al. A Closer Look at Few-shot Classification , 2019, ICLR.

[9] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[10] Joshua B. Tenenbaum,et al. One-shot learning by inverting a compositional causal process , 2013, NIPS.

[11] Qian Zhang,et al. Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation , 2020, ACM Multimedia.

[12] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yang Song,et al. Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[15] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Yi Jiang,et al. Learning to Segment the Tail , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Sheng Tang,et al. Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[19] Colin Wei,et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[20] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[21] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[22] Junjie Yan,et al. Equalization Loss for Long-Tailed Object Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Martial Hebert,et al. Learning to Model the Tail , 2017, NIPS.

[24] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[25] Xiang Yu,et al. Feature Transfer Learning for Face Recognition With Under-Represented Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[27] Trevor Darrell,et al. Frustratingly Simple Few-Shot Object Detection , 2020, ICML.

[28] Yi Yang,et al. Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Dragomir Anguelov,et al. Capturing Long-Tail Distributions of Object Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[31] Martial Hebert,et al. Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[33] Stella X. Yu,et al. Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[35] Qingming Huang,et al. Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.