AutoAssign: Differentiable Label Assignment for Dense Object Detection

In this paper, we propose an anchor-free object detector with a fully differentiable label assignment strategy, named AutoAssign. It automatically determines positive/negative samples by generating positive and negative weight maps to modify each location's prediction dynamically. Specifically, we present a center weighting module to adjust the category-specific prior distributions and a confidence weighting module to adapt the specific assign strategy of each instance. The entire label assignment process is differentiable and requires no additional modification to transfer to different datasets and tasks. Extensive experiments on MS COCO show that our method steadily surpasses other best sampling strategies by $ \sim $ 1\% AP with various backbones. Moreover, our best model achieves 52.1\% AP, outperforming all existing one-stage detectors. Besides, experiments on other datasets, \emph{e.g.}, PASCAL VOC, Objects365, and WiderFace, demonstrate the broad applicability of AutoAssign.

[1]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Larry S. Davis,et al.  Learning From Noisy Anchors for One-Stage Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[7]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[8]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[9]  Kaiming He,et al.  Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Tong Yang,et al.  MetaAnchor: Learning to Detect Objects with Customized Anchors , 2018, NeurIPS.

[11]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Hee Seok Lee,et al.  Probabilistic Anchor Assignment with IoU Prediction for Object Detection , 2020, ECCV.

[13]  Yi Yang,et al.  DenseBox: Unifying Landmark Localization with End to End Object Detection , 2015, ArXiv.

[14]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shifeng Zhang,et al.  Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yuning Jiang,et al.  UnitBox: An Advanced Object Detection Network , 2016, ACM Multimedia.

[17]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Stephen Lin,et al.  RepPoints: Point Set Representation for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[21]  Jian Sun,et al.  Objects365: A Large-Scale, High-Quality Dataset for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[23]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Rongrong Ji,et al.  FreeAnchor: Learning to Match Anchors for Visual Object Detection , 2019, NeurIPS.

[28]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zhiqiang Shen,et al.  Soft Anchor-Point Object Detection , 2019, ECCV.

[30]  Kai Chen,et al.  Region Proposal by Guided Anchoring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[33]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.