Dilated Convolution and Feature Fusion SSD Network for Small Object Detection in Remote Sensing Images

Noting the shortcomings of current methods in detecting small objects in image-based remote sensing applications, in this paper, we propose a novel implementation of single shot multibox detector (SSD) networks based on dilated convolution and feature fusion. We call this algorithm dilated convolution and feature fusion single shot multibox detector (DFSSD). This algorithm removes the random clipping steps of data preprocessing layers in conventional SSD networks and utilizes the structure of feature pyramid network (FPN) network to fuse the low-level feature map with high resolution and the high-level feature map with rich semantic information. It also enhances the receptive field of the third-level feature map of the DFSSD network by using dilated convolution. In the data processing step of the model, we use the image segmentation of the feature point region proposals to improve the training sample size. The mean average precision (mAP) value of the proposed DFSSD network, when tested on remote sensing datasets, achieves 76.51%, which is significantly higher than that of the SSD model (69.81%).

[1]  George Vosselman,et al.  Multi-Resolution Feature Fusion for Image Classification of Building Damages with Convolutional Neural Networks , 2018, Remote. Sens..

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jie Liu,et al.  Car detection from high-resolution aerial imagery using multiple features , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[4]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[5]  Meng Qing-chun Intelligent Robots and Development , 2004 .

[6]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[8]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[9]  Yiquan Wu,et al.  Scene classification from remote sensing images using mid-level deep feature learning , 2020, International Journal of Remote Sensing.

[10]  M. Posner,et al.  The attention system of the human brain: 20 years after. , 2012, Annual review of neuroscience.

[11]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Junwei Han,et al.  A Survey on Object Detection in Optical Remote Sensing Images , 2016, ArXiv.

[13]  Qun Liu,et al.  DeepSat V2: feature augmented convolutional neural nets for satellite image classification , 2019, Remote Sensing Letters.

[14]  Maridalia Guerrero A Comparative Study of Three Image Matcing Algorithms: Sift, Surf, and Fast , 2011 .

[15]  Marcin Woźniak,et al.  Multi-Level Features Extraction for Discontinuous Target Tracking in Remote Sensing Image Monitoring , 2019, Sensors.

[16]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[18]  Ren Zhi-ming Research of UAV Aerial Image Mosaic Based on SIFT , 2011 .

[19]  Shen Lan-sun Intelligent Visual Surveillance Technology:A Survey , 2007 .

[20]  Yaxiang Fan,et al.  Accurate non-maximum suppression for object detection in high-resolution remote sensing images , 2018 .

[21]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Wei Zhao,et al.  Multi-Scale Image Block-Level F-CNN for Remote Sensing Images Object Detection , 2019, IEEE Access.

[23]  Bo Du,et al.  Weakly Supervised Learning Based on Coupled Convolutional Neural Networks for Aircraft Detection , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Jie Huang,et al.  Training deep convolution neural network with hard example mining for airport detection , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[25]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Qing Liu,et al.  Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[27]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[28]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[29]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[30]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Fuqiang Zhou,et al.  FSSD: Feature Fusion Single Shot Multibox Detector , 2017, ArXiv.

[33]  Weipeng Jing,et al.  Fully Convolutional Neural Network with Augmented Atrous Spatial Pyramid Pool and Fully Connected Fusion Path for High Resolution Remote Sensing Image Segmentation , 2019, Applied Sciences.

[34]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[35]  Stefano Tubaro,et al.  Deep Convolutional Neural Networks for pedestrian detection , 2015, Signal Process. Image Commun..

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.