Multi-Scale Feature Fusion Network for Object Detection in VHR Optical Remote Sensing Images

In this paper, we propose a multi-scale feature fusion network (MS-FF Net) based on convolutional neural network (CNN) to deal with object detection in VHR images. In CNN, the low-level layers contain rich detail information and the high-level layers contain rich semantic information. Inspired by the idea of feature fusion, we propose an additional multi-scale feature fusion layer (MFL) to fuse the information between detail and semantic features. Then both large and small objects are considered by this network. Moreover, the network architecture and training strategies are designed to improve performance. Experiments on NWPU VHR-10 dataset demonstrate that the method with MFLs achieves significant improvement and outperforms compared methods in terms of mean average precision. Specially, the detection precision of airplane, baseball diamond, basketball court, ground track field and harbor categories exceeds 90% which is much higher than that of compared methods.

[1]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Junwei Han,et al.  Multi-class geospatial object detection and geographic image classification based on collection of part detectors , 2014 .

[5]  Junwei Han,et al.  A Survey on Object Detection in Optical Remote Sensing Images , 2016, ArXiv.

[6]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[7]  Yu Li,et al.  Automatic Target Detection in High-Resolution Remote Sensing Images Using Spatial Sparse Coding Bag-of-Words Model , 2012, IEEE Geoscience and Remote Sensing Letters.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  Junwei Han,et al.  Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding , 2014 .

[12]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).