Small Object Detection in Unmanned Aerial Vehicle Images Using Feature Fusion and Scaling-Based Single Shot Detector With Spatial Context Analysis

Objects in unmanned aerial vehicle (UAV) images are generally small due to the high-photography altitude. Although many efforts have been made in object detection, how to accurately and quickly detect small objects is still one of the remaining open challenges. In this paper, we propose a feature fusion and scaling-based single shot detector (FS-SSD) for small object detection in the UAV images. The FS-SSD is an enhancement based on FSSD, a variety of the original single shot multibox detector (SSD). We add an extra scaling branch of the deconvolution module with an average pooling operation to form a feature pyramid. The original feature fusion branch is adjusted to be better suited to the small object detection task. The two feature pyramids generated by the deconvolution module and feature fusion module are utilized to make predictions together. In addition to the deep features learned by the FS-SSD, to further improve the detection accuracy, spatial context analysis is proposed to incorporate the object spatial relationships into object redetection. The interclass and intraclass distances between different object instances are computed as a spatial context, which proves effective for multiclass small object detection. Six experiments are conducted on the PASCAL VOC dataset and the two UAV image datasets. The experimental results demonstrate that the proposed method can achieve a comparable detection speed but an accuracy superior to those of the six state-of-the-art methods.

[1]  I. Colomina,et al.  Unmanned aerial systems for photogrammetry and remote sensing: A review , 2014 .

[2]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[3]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[4]  Nuri Yilmazer,et al.  Application of Object Detection and Tracking Techniques for Unmanned Aerial Vehicles , 2015, Complex Adaptive Systems.

[5]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[6]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[8]  Farid Melgani,et al.  A Convolutional Neural Network Approach for Assisting Avalanche Search and Rescue Operations with UAV Imagery , 2017, Remote. Sens..

[9]  Jinhui Tang,et al.  Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[10]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Fangliang Chen,et al.  Detecting and tracking vehicles in traffic by unmanned aerial vehicles , 2016 .

[12]  Aakanksha Chowdhery,et al.  Networked Drone Cameras for Sports Streaming , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Ying Liu,et al.  An enhanced SSD with feature fusion and visual reasoning for object detection , 2019, Neural Computing and Applications.

[18]  Rakesh Mehta,et al.  Object detection at 200 Frames Per Second , 2018, ECCV Workshops.

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  Jinhui Tang,et al.  Weakly-Shared Deep Transfer Networks for Heterogeneous-Domain Knowledge Propagation , 2015, ACM Multimedia.

[22]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[23]  Qionghai Dai,et al.  Depth Estimation by Parameter Transfer With a Lightweight Model for Single Still Images , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiangyu Zhang,et al.  DetNet: A Backbone network for Object Detection , 2018, ArXiv.

[26]  Ling Shao,et al.  Efficient Feature Selection and Classification for Vehicle Detection , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Farid Melgani,et al.  Detecting Cars in UAV Images With a Catalog-Based Approach , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[29]  Nikolaos Grammalidis,et al.  Spatio-Temporal Flame Modeling and Dynamic Texture Analysis for Automatic Video-Based Fire Detection , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Qiaosong Wang,et al.  Object Recognition in Aerial Images Using Convolutional Neural Networks , 2017, J. Imaging.

[31]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[32]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[33]  Winston H. Hsu,et al.  Drone-Based Object Counting by Spatially Regularized Regional Proposal Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Guo-Jun Qi,et al.  Hierarchically Gated Deep Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[36]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[37]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[38]  LOW ALTITUDE AERIAL PHOTOGRAPHY , 2006 .

[39]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[40]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[41]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[42]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.