Towards Accurate One-Stage Object Detection With AP-Loss

One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously, with the former suffering much from extreme foreground-background class imbalance issue due to the large number of anchors. This paper alleviates this issue by proposing a novel framework to replace the classification task in one-stage detectors with a ranking task, and adopting the Average-Precision loss (AP-loss) for the ranking problem. Due to its non-differentiability and non-convexity, the AP-loss cannot be optimized directly. For this purpose, we develop a novel optimization algorithm, which seamlessly combines the error-driven update scheme in perceptron learning and backpropagation algorithm in deep networks. We verify good convergence property of the proposed algorithm theoretically and empirically. Experimental results demonstrate notable performance improvement in state-of-the-art one-stage detectors based on AP-loss over different kinds of classification-losses on various benchmarks, without changing the network architectures.

[1]  Bo Wang,et al.  Single-Shot Object Detection with Enriched Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Patrick Dupont,et al.  The AdaTron: An Adaptive Perceptron Algorithm , 2017 .

[4]  Weiyao Lin,et al.  Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages , 2018, BMVC.

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Mun-Cheon Kang,et al.  Parallel Feature Pyramid Network for Object Detection , 2018, ECCV.

[7]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  C. V. Jawahar,et al.  Efficient Optimization for Rank-Based Loss Functions , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[11]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[14]  A. Wendemuth LEARNING THE UNLEARNABLE , 1995 .

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[18]  Jiwen Lu,et al.  Learning Globally Optimized Object Detector via Policy Gradient , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[21]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[22]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[23]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[27]  Yang Song,et al.  Training Deep Neural Networks via Direct Loss Minimization , 2015, ICML.

[28]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yunhong Wang,et al.  Receptive Field Block Net for Accurate and Fast Object Detection , 2017, ECCV.

[34]  W. Krauth,et al.  Learning algorithms with optimal stability in neural networks , 1987 .

[35]  Jianguo Li,et al.  Learning SURF Cascade for Fast and Accurate Object Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Vittorio Ferrari,et al.  End-to-End Training of Object Class Detectors for Mean Average Precision , 2016, ACCV.

[37]  C. V. Jawahar,et al.  Efficient Optimization for Average Precision SVM , 2014, NIPS.

[38]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.