Resolving Class Imbalance in Object Detection with Weighted Cross Entropy Losses

Object detection is an important task in computer vision which serves a lot of real-world applications such as autonomous driving, surveillance and robotics. Along with the rapid thrive of large-scale data, numerous state-of-the-art generalized object detectors (e.g. Faster R-CNN, YOLO, SSD) were developed in the past decade. Despite continual efforts in model modification and improvement in training strategies to boost detection accuracy, there are still limitations in performance of detectors when it comes to specialized datasets with uneven object class distributions. This originates from the common usage of Cross Entropy loss function for object classification sub-task that simply ignores the frequency of appearance of object class during training, and thus results in lower accuracies for object classes with fewer number of samples. Class-imbalance in general machine learning has been widely studied, however, little attention has been paid on the subject of object detection. In this paper, we propose to explore and overcome such problem by application of several weighted variants of Cross Entropy loss, for examples Balanced Cross Entropy, Focal Loss and Class-Balanced Loss Based on Effective Number of Samples to our object detector. Experiments with BDD100K (a highly class-imbalanced driving database acquired from on-vehicle cameras capturing mostly Car-class objects and other minority object classes such as Bus, Person and Motor) have proven better class-wise performances of detector trained with the afore-mentioned loss functions.

[1]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[2]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Lisha Cui,et al.  MDSSD: multi-scale deconvolutional single shot detector for small objects , 2018, Science China Information Sciences.

[5]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[11]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Xiongwei Wu,et al.  Recent Advances in Deep Learning for Object Detection , 2019, Neurocomputing.

[13]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[14]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[15]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[16]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[18]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[19]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[20]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[22]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).