Dynamic Label Assignment for Object Detection by Combining Predicted and Anchor IoUs

Label assignment plays a significant role in modern object detection models. Detection models may yield totally different performances with different label assignment strategies. For anchor-based detection models, the IoU threshold between the anchors and their corresponding ground truth bounding boxes is the key element since the positive samples and negative samples are divided by the IoU threshold. Early object detectors simply utilize a fixed threshold for all training samples, while recent detection algorithms focus on adaptive thresholds based on the distribution of the IoUs to the ground truth boxes. In this paper, we introduce a simple and effective approach to perform label assignment dynamically based on the training status with predictions. By introducing the predictions in label assignment, more high-quality samples with higher IoUs to the ground truth objects are selected as the positive samples, which could reduce the discrepancy between the classification scores and the IoU scores, and generate more high-quality boundary boxes. Our approach shows improvements in the performance of the detection models with the adaptive label assignment algorithm and lower bounding box losses for those positive samples, indicating more samples with higher quality predicted boxes are selected as positives. The source code will be available at https://github.com/ZTX-100/DLA-Combined-IoUs.

[1]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[3]  Guanghui Wang,et al.  A Discriminative Channel Diversification Network for Image Classification , 2021, Pattern Recognition Letters.

[4]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[5]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Jun Li,et al.  Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jun Li,et al.  Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection , 2020, NeurIPS.

[8]  Zeyi Huang,et al.  Multiple Anchor Learning for Visual Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Peng Gao,et al.  Fast Convergence of DETR with Spatially Modulated Co-Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  Jie Zhou,et al.  SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular images , 2021, Neurocomputing.

[12]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[13]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Cuncong Zhong,et al.  Colonoscopy polyp detection and classification: Dataset creation and comparative evaluations , 2021, PloS one.

[15]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[16]  Xiaogang Wang,et al.  End-to-End Object Detection with Adaptive Clustering Transformer , 2020, BMVC.

[17]  Guanghui Wang,et al.  Real-time Golf Ball Detection and Tracking Based on Convolutional Neural Networks , 2020, 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[18]  Guanghui Wang,et al.  Object Detection with Convolutional Neural Networks , 2019, Deep Learning in Computer Vision.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[21]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Guanghui Wang,et al.  MDFN: Multi-Scale Deep Feature Learning Network for Object Detection , 2019, Pattern Recognit..

[23]  Depu Meng,et al.  Conditional DETR for Fast Training Convergence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Hee Seok Lee,et al.  Probabilistic Anchor Assignment with IoU Prediction for Object Detection , 2020, ECCV.

[25]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[27]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Guanghui Wang,et al.  Miti-DETR: Object Detection based on Transformers with Mitigatory Self-Attention Convergence , 2021, ArXiv.

[30]  Ying Wang,et al.  VarifocalNet: An IoU-aware Dense Object Detector , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Guanghui Wang,et al.  Deep Feature Augmentation for Occluded Image Classification , 2020, Pattern Recognit..

[33]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[34]  Benjin Zhu,et al.  AutoAssign: Differentiable Label Assignment for Dense Object Detection , 2020, ArXiv.

[35]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Guanghui Wang,et al.  Efficient Golf Ball Detection and Tracking Based on Convolutional Neural Networks and Kalman Filter , 2020, ArXiv.

[38]  Shifeng Zhang,et al.  Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Guanghui Wang,et al.  Adaptively Denoising Proposal Collection for Weakly Supervised Object Localization , 2019, Neural Processing Letters.

[40]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[41]  Rongrong Ji,et al.  FreeAnchor: Learning to Match Anchors for Visual Object Detection , 2019, NeurIPS.