论文信息 - Association Loss for Visual Object Detection

Association Loss for Visual Object Detection

Convolutional neural network (CNN) is a popular choice for visual object detection where two sub-nets are often used to achieve object classification and localization separately. However, the intrinsic relation between the localization and classification sub-nets was not exploited explicitly for object detection. In this letter, we propose a novel association loss, namely, the proxy squared error (PSE) loss, to entangle the two sub-nets, thus use the dependency between the classification and localization scores obtained from these two sub-nets to improve the detection performance. We evaluate our proposed loss on the MS-COCO dataset and compare it with the loss in a recent baseline, i.e. the fully convolutional one-stage (FCOS) detector. The results show that our method can improve the <inline-formula><tex-math notation="LaTeX">$\mathrm{AP}$</tex-math></inline-formula> from 33.8 to 35.4 and <inline-formula><tex-math notation="LaTeX">${\rm AP}_{75}$</tex-math></inline-formula> from 35.4 to 37.8, as compared with the FCOS baseline.

[1] Marios Savvides,et al. Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Jacob Benesty,et al. On the Importance of the Pearson Correlation Coefficient in Noise Reduction , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Xiangyu Zhang,et al. Bounding Box Regression With Uncertainty for Accurate Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Yong Man Ro,et al. Towards High-Performance Object Detection: Task-Specific Design Considering Classification and Localization Separation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Gerhard Nahler,et al. Pearson Correlation Coefficient , 2020, Definitions.

[9] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[12] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[13] Yuning Jiang,et al. Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[14] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Yuning Jiang,et al. UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[18] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).