Novel up-scale feature aggregation for object detection in aerial images

Abstract Object detection is a pivotal task for many unmanned aerial vehicle (UAV) applications. Compared to general scenes, the objects in aerial images are typically much smaller. For this reason, most general object detectors suffer from two critical challenges while dealing with aerial images: 1) The widely exploited Feature Pyramid Network works by integrating high-level features to lower levels progressively. However, this manner does not transfer equivalent information from each level of backbone network to the generated features, and the shared detection head faces an unbalanced sources of information flow, damaging the detection accuracy. 2) Up-sampling is commonly used to expand feature resolution for feature fusion or feature aggregation. However, existing up-sampling methods are ineffective to reconstruct high resolution feature maps. To address these two challenges, two works are proposed: 1) An up-scale feature aggregation framework that fully utilizes multi-scale complementary information, and 2) a novel up-sampling method that further improve detection accuracy. These two proposals are integrated into an end-to-end single-stage object detector namely HawkNet. Extensive experiments are conducted on VisDrone-DET2018, UAVDT and DIOR datasets. Compared to the RetinaNet baseline, our HawkNet achieves absolute gains of 6.0%, 1.2% and 5.9% in average precision (AP) on VisDrone-DET2018, UAVDT and DIOR datasets, respectively. For a 800  ×  1333 input on the UAVDT dataset, HawkNet with ResNet-50 backbone surpasses existing methods for single-scale inference and achieves the best performance (37.4 AP), while operating at 10.6 frames per second on a single Nvidia GTX 1080Ti GPU.

[1]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[3]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[6]  Dong Xu,et al.  Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2019, IEEE Transactions on Image Processing.

[7]  Erik Blasch,et al.  Clustered Object Detection in Aerial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Chao Li,et al.  Cascade Region Proposal and Global Context for Deep Object Detection , 2017, Neurocomputing.

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Bertrand Le Saux,et al.  Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images , 2017, Remote. Sens..

[11]  Yi Wang,et al.  Scale-Recurrent Network for Deep Image Deblurring , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[13]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Deng Cai,et al.  Deep feature based contextual model for object detection , 2016, Neurocomputing.

[15]  Jian Sun,et al.  Object Detection Networks on Convolutional Feature Maps , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[17]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Igor Sevo,et al.  Convolutional Neural Network Based Automatic Object Detection on Aerial Images , 2016, IEEE Geoscience and Remote Sensing Letters.

[19]  Nicu Sebe,et al.  The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline , 2019, International Journal of Computer Vision.

[20]  Huicheng Zheng,et al.  Detail preservation and feature refinement for object detection , 2019, Neurocomputing.

[21]  Tianqi Zhang,et al.  A feature enriching object detection framework with weak segmentation loss , 2019, Neurocomputing.

[22]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Huajun Feng,et al.  Libra R-CNN: Towards Balanced Learning for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Bin Liu,et al.  Using multi-label classification to improve object detection , 2019, Neurocomputing.

[25]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[26]  Gang Wan,et al.  Object Detection in Optical Remote Sensing Images: A Survey and A New Benchmark , 2020, ISPRS Journal of Photogrammetry and Remote Sensing.

[27]  Jianhua Lu,et al.  Hierarchical objectness network for region proposal generation and object detection , 2018, Pattern Recognit..

[28]  Ke Li,et al.  Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Xinbo Gao,et al.  Fast and Accurate Single Image Super-Resolution via Information Distillation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Qiong Liu,et al.  Scale adaptive image cropping for UAV object detection , 2019, Neurocomputing.

[32]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[34]  Jun Du,et al.  Rotated cascade R-CNN: A shape robust detector with coordinate regression , 2019, Pattern Recognit..

[35]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Gong Cheng,et al.  RIFD-CNN: Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Guanghui He,et al.  Scale Adaptive Proposal Network for Object Detection in Remote Sensing Images , 2019, IEEE Geoscience and Remote Sensing Letters.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).