The Devil is in Classification: A Simple Framework for Long-tail Object Detection and Instance Segmentation

Most existing object instance detection and segmentation models only work well on fairly balanced benchmarks where per-category training sample numbers are comparable, such as COCO. They tend to suffer performance drop on realistic datasets that are usually long-tailed. This work aims to study and address such open challenges. Specifically, we systematically investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals. Based on such an observation, we first consider various techniques for improving long-tail classification performance which indeed enhance instance segmentation results. We then propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach. Without bells and whistles, it significantly boosts the performance of instance segmentation for tail classes on the recent LVIS dataset and our sampled COCO-LT dataset. Our analysis provides useful insights for solving long-tail instance detection and segmentation problems, and the straightforward \emph{SimCal} method can serve as a simple but strong baseline. With the method we have won the 2019 LVIS challenge. Codes and models are available at https://github.com/twangnh/SimCal.

[1]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yu Li,et al.  Classification Calibration for Long-tail Instance Segmentation , 2019, ArXiv.

[9]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[10]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[12]  Barbara Caputo,et al.  DeepNCM: Deep Nearest Class Mean Classifiers , 2018, ICLR.

[13]  W. Reed The Pareto, Zipf and other power laws , 2001 .

[14]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[15]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Sharath Pankanti,et al.  RepMet: Representative-Based Metric Learning for Classification and Few-Shot Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Marcus Rohrbach,et al.  Decoupling Representation and Classifier for Long-Tailed Recognition , 2020, ICLR.

[20]  Qingming Huang,et al.  Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks , 2015, ECCV.

[21]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[22]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[24]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[25]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[26]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[29]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Chen Huang,et al.  Deep Imbalanced Learning for Face Recognition and Attribute Prediction , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[33]  Xin Wang,et al.  Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Ethem Alpaydin,et al.  Multiple networks for function learning , 1993, IEEE International Conference on Neural Networks.

[35]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hao Chen,et al.  LSTD: A Low-Shot Transfer Detector for Object Detection , 2018, AAAI.

[37]  Sheng Tang,et al.  Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).