Side-Aware Boundary Localization for More Precise Object Detection

Current object detection frameworks mainly rely on bounding box regression to localize objects. Despite the remarkable progress in recent years, the precision of bounding box regression remains unsatisfactory, hence limiting performance in object detection. We observe that precise localization requires careful placement of each side of the bounding box. However, the mainstream approach, which focuses on predicting centers and sizes, is not the most effective way to accomplish this task, especially when there exists displacements with large variance between the anchors and the this http URL this paper, we propose an alternative approach, named as Side-Aware Boundary Localization (SABL), where each side of the bounding box is respectively localized with a dedicated network branch. Moreover, to tackle the difficulty of precise localization in the presence of displacements with large variance, we further propose a two-step localization scheme, which first predicts a range of movement through bucket prediction and then pinpoints the precise position within the predicted bucket. We test the proposed method on both two-stage and single-stage detection frameworks. Replacing the standard bounding box regression branch with the proposed design leads to significant improvements on Faster R-CNN, RetinaNet, and Cascade R-CNN, by 3.0%, 1.6%, and 0.9%, respectively. Code and models will be available at this https URL.

[1]  Hyuk-Jae Lee,et al.  Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Larry S. Davis,et al.  G-CNN: An Iterative Grid Based Object Detector , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Hao Chen,et al.  NAS-FCOS: Fast Neural Architecture Search for Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiaoou Tang,et al.  Switchable Whitening for Deep Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[8]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yuning Jiang,et al.  FoveaBox: Beyond Anchor-based Object Detector , 2019, ArXiv.

[11]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ying Chen,et al.  M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network , 2018, AAAI.

[13]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[14]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[15]  Yuning Jiang,et al.  MegDet: A Large Mini-Batch Object Detector , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[18]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, International Journal of Computer Vision.

[19]  Hui Zhou,et al.  Robust Multi-Modality Multi-Object Tracking , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[21]  Nikos Komodakis,et al.  LocNet: Improving Localization Accuracy for Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Kai Chen,et al.  Region Proposal by Guided Anchoring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Fuchun Sun,et al.  Deep Feature Pyramid Reconfiguration for Object Detection , 2018, ECCV.

[26]  Kai Chen,et al.  Optimizing Video Object Detection via a Scale-Time Lattice , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Stephen Lin,et al.  Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiangyu Zhang,et al.  Bounding Box Regression With Uncertainty for Accurate Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Kai Chen,et al.  Prime Sample Attention in Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yuning Jiang,et al.  Acquisition of Localization Confidence for Accurate Object Detection , 2018, ECCV.

[32]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Junjie Yan,et al.  Grid R-CNN , 2018, 1811.12030.

[34]  Cewu Lu,et al.  InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Dahua Lin,et al.  Unifying Identification and Context Learning for Person Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[37]  Junjie Yan,et al.  Grid R-CNN Plus: Faster and Better , 2019, ArXiv.

[38]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Jian Sun,et al.  DetNAS: Backbone Search for Object Detection , 2019, NeurIPS.

[40]  Dahua Lin,et al.  Self-Supervised Learning via Conditional Motion Propagation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Xiaogang Wang,et al.  PointRCNN: 3D Object Proposal Generation and Detection From Point Cloud , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[45]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Kaiming He,et al.  Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Rongrong Ji,et al.  FreeAnchor: Learning to Match Anchors for Visual Object Detection , 2019, NeurIPS.

[51]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Bo Dai,et al.  Self-Supervised Scene De-Occlusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Kai Chen,et al.  CARAFE: Content-Aware ReAssembly of FEatures , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Stephen Lin,et al.  RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Bolei Zhou,et al.  A Graph-Based Framework to Bridge Movies and Synopses , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).