HyNet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery

Abstract Faced with the problem of the large scale variation, geospatial object detection in multiple spatial resolution (MSR) remote sensing imagery is a challenging task. To avoid the scale problem, the current convolutional neural network (CNN) based object detectors use multi-scale structures in the convolutional layer level to improve the detection performance by utilizing different receptive fields in the convolutional layers with different scales to capture objects with different scales. Examples of such methods are the image pyramid, pyramidal feature hierarchy, and the feature pyramid network. However, in MSR imagery, it is still difficult to model the large scale variation of geospatial objects for the existing multi-scale structures as their receptive fields are limited due to the fixed number of layers. In this paper, to solve the problem, a hyper-scale object detection framework for MSR imagery, namely HyNet, is proposed to alleviate the extreme scale-variation problem by learning hyper-scale feature representation. Differing from the previous multi-scale structure operation in the level of the convolutional layer, HyNet uses a hyper-scale block as the core structure, namely the HyBlock, in the sub-layer group level. In the HyBlock, each convolutional layer in the multi-scale structure is first divided into sub-layer groups with an equal size. In the sub-layer group level, hyper-scale features are obtained by a multi-scale sub-layer group operation with pyramidal receptive fields in the convolutional layers of each scale, which means that HyBlock is a fine-grained multi-scale structure. To effectively aggregate the hyper-scale features, group connection in the sub-layer level is used for intra-layer message passing. By promoting the intra-layer message passing to capture the scale-invariance of the hyper-scale features, the group connection can alleviate the scale-variation issue for object detection in MSR imagery. To better utilize the hyper-scale features, adaptive feature selection is proposed to select more effective hyper-scale features via adaptively weighting the different hyper-scale features. The experimental results obtained using three object detection datasets demonstrate that HyNet can learn a robust scale-invariant feature representation and can outperform the previous algorithms, and hence provides an effective new option for object detection in MSR remote sensing imagery.

[1]  Ping Zhong,et al.  A Multiple Conditional Random Fields Ensemble Model for Urban Area Detection in Remote Sensing Optical Images , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  William J. Emery,et al.  Contextually guided very-high-resolution imagery classification with semantic segments , 2017 .

[4]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Shihong Du,et al.  Learning multiscale and deep representations for classifying remotely sensed imagery , 2016 .

[6]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Line Eikvil,et al.  Classification-based vehicle detection in high-resolution satellite images , 2009 .

[8]  Yanfei Zhong,et al.  Color: Cycling Offline Learning and Online Representing for Remote Sensing Dataflow , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[9]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[10]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[11]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Lin Lei,et al.  Multi-scale object detection in remote sensing imagery with convolutional neural networks , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jun Zhou,et al.  Multiscale Visual Attention Networks for Object Detection in VHR Remote Sensing Images , 2019, IEEE Geoscience and Remote Sensing Letters.

[17]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[18]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[19]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20]  Liujuan Cao,et al.  Vehicle Detection in High-Resolution Aerial Images via Sparse Representation and Superpixels , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Ye Zhang,et al.  A light and faster regional convolutional neural network for object detection in optical remote sensing images , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[23]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Junwei Han,et al.  A Survey on Object Detection in Optical Remote Sensing Images , 2016, ArXiv.

[26]  Rogério Schmidt Feris,et al.  A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection , 2016, ECCV.

[27]  Ke Li,et al.  Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[30]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[31]  Yanfei Zhong,et al.  Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery , 2018 .

[32]  Kil To Chong,et al.  Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network , 2018, Sensors.

[33]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Huanxin Zou,et al.  Toward Fast and Accurate Vehicle Detection in Aerial Images Using Coupled Region-Based Convolutional Neural Networks , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[36]  Zhihai Xu,et al.  $\mathcal{R}^2$ -CNN: Fast Tiny Object Detection in Large-Scale Remote Sensing Images , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[37]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[38]  Deren Li,et al.  Object Classification of Aerial Images With Bag-of-Visual Words , 2010, IEEE Geoscience and Remote Sensing Letters.