Feature Pyramid Networks for Object Detection

Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

[1]  Edward H. Adelson,et al.  PYRAMID METHODS IN IMAGE PROCESSING. , 1984 .

[2]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[3]  R. Vaillant,et al.  An original approach for the localization of objects in images , 1993 .

[4]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[5]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[11]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[13]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[14]  Pietro Perona,et al.  Fast Feature Pyramids for Object Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[20]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[21]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Nikos Komodakis,et al.  Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Soumith Chintala,et al.  A MultiPath Network for Object Detection , 2016, BMVC.

[26]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[27]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[30]  Sina Honari,et al.  Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[34]  Fuchun Sun,et al.  HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Nikos Komodakis,et al.  Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization , 2016, BMVC.

[36]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yi Li,et al.  Instance-Sensitive Fully Convolutional Networks , 2016, ECCV.

[38]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[39]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[41]  Jian Sun,et al.  Object Detection Networks on Convolutional Feature Maps , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.