Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images

Abstract Object detection plays an important role in the field of remote sensing imagery analysis. The most challenging issues in advancing this task are the large variation in object scales and the arbitrary orientation of objects. In this paper, we build a unified framework upon the region-based convolutional neural network for arbitrary-oriented and multi-scale object detection in remote sensing images. To handle the problem of multi-scale object detection, a feature-fusion architecture is proposed to generate a multi-scale feature hierarchy, which augments the features of shallow layers with semantic representations via a top-down pathway and combines the feature maps of top layers with low-level information by a bottom-up pathway. By combining features of different levels, we can form a powerful feature representation for multi-scale objects. Most previous methods locate objects with arbitrary orientations and dense spatial distributions via axis-aligned boxes, which may cover adjacent instances and background areas. We build a rotation-aware object detector that uses oriented boxes to localize objects in remote sensing images. The region proposal network augments the anchors with multiple default angles to cover oriented objects. It utilizes oriented proposal boxes to enclose objects rather than horizontal proposals that coarsely locate oriented objects. The orientation RoI pooling operation is introduced to extract the feature maps of oriented proposals for the following R-CNN subnetwork. We conduct comprehensive experiments on a public dataset for oriented object detection in remote sensing images. Our method achieves state-of-the-art performance, which demonstrates the effectiveness of the proposed methods.

[1]  Liujuan Cao,et al.  Vehicle Detection in High-Resolution Aerial Images via Sparse Representation and Superpixels , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Yang Long,et al.  Learning RoI Transformer for Oriented Object Detection in Aerial Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yiping Yang,et al.  Ship Rotated Bounding Box Space for Ship Extraction From High-Resolution Optical Satellite Images With Complex Backgrounds , 2016, IEEE Geoscience and Remote Sensing Letters.

[4]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[5]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Hsu-Yung Cheng,et al.  Vehicle Detection in Aerial Surveillance Using Dynamic Bayesian Networks , 2012, IEEE Transactions on Image Processing.

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  Junwei Han,et al.  A Survey on Object Detection in Optical Remote Sensing Images , 2016, ArXiv.

[9]  Wenkai Zhang,et al.  Online Multi-Object Tracking via Combining Discriminative Correlation Filters With Making Decision , 2018, IEEE Access.

[10]  Kaiming He,et al.  Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[12]  Qing Liu,et al.  Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Gellért Máttyus,et al.  Fast Multiclass Vehicle Detection on Aerial Images , 2015, IEEE Geoscience and Remote Sensing Letters.

[14]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[15]  Horst Bischof,et al.  A 3D Teacher for Car Detection in Aerial Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Jiao Jiao,et al.  A Densely Connected End-to-End Neural Network for Multiscale and Multiscene SAR Ship Detection , 2018, IEEE Access.

[17]  Larry S. Davis,et al.  Vehicle Detection Using Partial Least Squares , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Huanxin Zou,et al.  Toward Fast and Accurate Vehicle Detection in Aerial Images Using Coupled Region-Based Convolutional Neural Networks , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[21]  Xinwei Zheng,et al.  Efficient Saliency-Based Object Detection in Remote Sensing Images Using Deep Belief Networks , 2016, IEEE Geoscience and Remote Sensing Letters.

[22]  Gui-Song Xia,et al.  Rotation-Sensitive Regression for Oriented Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Ke Li,et al.  Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[26]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yizhuang Xie,et al.  M-FCN: Effective Fully Convolutional Network-Based Airplane Detection Framework , 2017, IEEE Geoscience and Remote Sensing Letters.

[30]  Junwei Han,et al.  Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding , 2014 .

[31]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yiping Yang,et al.  Rotated region based CNN for ship detection , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[34]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[35]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[36]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Xiao Xiang Zhu,et al.  Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources , 2017, IEEE Geoscience and Remote Sensing Magazine.

[39]  Xiangyang Xue,et al.  Arbitrary-Oriented Scene Text Detection via Rotation Proposals , 2017, IEEE Transactions on Multimedia.

[40]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Junwei Han,et al.  Multi-class geospatial object detection and geographic image classification based on collection of part detectors , 2014 .

[42]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[44]  Barry R. Masters,et al.  Digital Image Processing, Third Edition , 2009 .

[45]  Gui-Song Xia,et al.  AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[46]  Zhiqiang Shen,et al.  Object Detection from Scratch with Deep Supervision , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Xiang Bai,et al.  TextBoxes++: A Single-Shot Oriented Scene Text Detector , 2018, IEEE Transactions on Image Processing.

[49]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).