Multi-Scale Semantic Segmentation and Spatial Relationship Recognition of Remote Sensing Images Based on an Attention Model

A comprehensive interpretation of remote sensing images involves not only remote sensing object recognition but also the recognition of spatial relations between objects. Especially in the case of different objects with the same spectrum, the spatial relationship can help interpret remote sensing objects more accurately. Compared with traditional remote sensing object recognition methods, deep learning has the advantages of high accuracy and strong generalizability regarding scene classification and semantic segmentation. However, it is difficult to simultaneously recognize remote sensing objects and their spatial relationship from end-to-end only relying on present deep learning networks. To address this problem, we propose a multi-scale remote sensing image interpretation network, called the MSRIN. The architecture of the MSRIN is a parallel deep neural network based on a fully convolutional network (FCN), a U-Net, and a long short-term memory network (LSTM). The MSRIN recognizes remote sensing objects and their spatial relationship through three processes. First, the MSRIN defines a multi-scale remote sensing image caption strategy and simultaneously segments the same image using the FCN and U-Net on different spatial scales so that a two-scale hierarchy is formed. The output of the FCN and U-Net are masked to obtain the location and boundaries of remote sensing objects. Second, using an attention-based LSTM, the remote sensing image captions include the remote sensing objects (nouns) and their spatial relationships described with natural language. Finally, we designed a remote sensing object recognition and correction mechanism to build the relationship between nouns in captions and object mask graphs using an attention weight matrix to transfer the spatial relationship from captions to objects mask graphs. In other words, the MSRIN simultaneously realizes the semantic segmentation of the remote sensing objects and their spatial relationship identification end-to-end. Experimental results demonstrated that the matching rate between samples and the mask graph increased by 67.37 percentage points, and the matching rate between nouns and the mask graph increased by 41.78 percentage points compared to before correction. The proposed MSRIN has achieved remarkable results.

[1]  Hao Wu,et al.  Convolutional Recurrent Neural Networks forHyperspectral Data Classification , 2017, Remote. Sens..

[2]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[3]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[4]  Haihong Zhu,et al.  A Multiple-Feature Reuse Network to Extract Buildings from Remote Sensing Imagery , 2018, Remote. Sens..

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Pengqiang Zhang,et al.  Spectral-spatial classification of hyperspectral imagery based on recurrent neural networks , 2018, Remote Sensing Letters.

[7]  Chunhong Pan,et al.  Building extraction from multi-source remote sensing images via deep deconvolution neural networks , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[8]  Fan Zhang,et al.  Deep Convolutional Neural Networks for Hyperspectral Image Classification , 2015, J. Sensors.

[9]  Bernd Freisleben,et al.  Fast Cloud Segmentation Using Convolutional Neural Networks , 2018, Remote. Sens..

[10]  Junwei Han,et al.  Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Rongrong Ji,et al.  GroupCap: Group-Based Image Captioning with Structured Relevance and Diversity Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Xiao Xiang Zhu,et al.  Deep Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Biao Wang,et al.  Building Extraction in Very High Resolution Imagery by Dense-Attention Networks , 2018, Remote. Sens..

[14]  Peerapon Vateekul,et al.  Semantic Segmentation on Remotely Sensed Images Using an Enhanced Global Convolutional Network with Channel Attention and Domain Specific Transfer Learning , 2018, Remote. Sens..

[15]  Hui Chen,et al.  Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning , 2018, IJCAI.

[16]  Weijia Li,et al.  Large-Scale Oil Palm Tree Detection from High-Resolution Satellite Images Using Two-Stage Convolutional Neural Networks , 2018, Remote. Sens..

[17]  Emile Ndikumana,et al.  Deep Recurrent Neural Network for Agricultural Classification using multitemporal SAR Sentinel-1 for Camargue, France , 2018, Remote. Sens..

[18]  Bo Wang,et al.  Image captioning based on deep reinforcement learning , 2018, ICIMCS '18.

[19]  Xin Wang,et al.  Description Generation for Remote Sensing Images Using Attribute Attention Mechanism , 2019, Remote. Sens..

[20]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[21]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zhenwei Shi,et al.  Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image? , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Qi Zhou,et al.  Application of a parallel spectral–spatial convolution neural network in object-oriented remote sensing land use classification , 2018 .

[24]  Xuelong Li,et al.  3G structure for image caption generation , 2019, Neurocomputing.

[25]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27]  Qingshan Liu,et al.  Bidirectional-Convolutional LSTM Based Spectral-Spatial Feature Learning for Hyperspectral Image Classification , 2017, Remote. Sens..

[28]  Dongming Lu,et al.  Mapping Impervious Surfaces in Town-Rural Transition Belts Using China's GF-2 Imagery and Object-Based Deep CNNs , 2019, Remote. Sens..

[29]  Shaoping Xu,et al.  Region-based cascade pooling of convolutional features for HRRS image retrieval , 2018 .

[30]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[31]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  LI Qing-qing Study on Optimal Segmentation Scale Based on Fractal Dimension of Remote Sensing Images , 2011 .

[33]  Arno Schäpe,et al.  Multiresolution Segmentation : an optimization approach for high quality multi-scale image segmentation , 2000 .

[34]  Tao Zhang,et al.  A Comprehensive Evaluation of Approaches for Built-Up Area Extraction from Landsat OLI Images Using Massive Samples , 2018, Remote. Sens..

[35]  Anthony M. Filippi,et al.  Hyperspectral Image Classification Using Similarity Measurements-Based Deep Recurrent Neural Networks , 2019, Remote. Sens..

[36]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[37]  Xiangtao Zheng,et al.  Semantic Descriptions of High-Resolution Remote Sensing Images , 2019, IEEE Geoscience and Remote Sensing Letters.

[38]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  D.A. Landgrebe,et al.  Classification with spatio-temporal interpixel class dependency contexts , 1992, IEEE Trans. Geosci. Remote. Sens..

[41]  Bo Qu,et al.  Deep semantic understanding of high resolution remote sensing image , 2016, 2016 International Conference on Computer, Information and Telecommunication Systems (CITS).

[42]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[43]  Feng Li,et al.  Fusion of Multiscale Convolutional Neural Networks for Building Extraction in Very High-Resolution Images , 2019, Remote. Sens..

[44]  Xiangtao Zheng,et al.  Exploring Models and Data for Remote Sensing Image Caption Generation , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[45]  Gui-Song Xia,et al.  Transferring Deep Convolutional Neural Networks for the Scene Classification of High-Resolution Remote Sensing Imagery , 2015, Remote. Sens..

[46]  Lei Guo,et al.  Object Detection in Optical Remote Sensing Images Based on Weakly Supervised Learning and High-Level Feature Learning , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[47]  Miaozhong Xu,et al.  DenseNet-Based Depth-Width Double Reinforced Deep Learning Neural Network for High-Resolution Remote Sensing Image Per-Pixel Classification , 2018, Remote. Sens..

[48]  Bo Du,et al.  Scene Classification via a Gradient Boosting Random Convolutional Network Framework , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[49]  Xuezhi Yang,et al.  A multi-depth convolutional neural network for SAR image classification , 2018, Remote Sensing Letters.

[50]  Garrison W. Cottrell,et al.  Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yanfei Zhong,et al.  Large patch convolutional neural networks for the scene classification of high spatial resolution imagery , 2016 .

[52]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[53]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[54]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Wei Cui,et al.  Application of a Hybrid Model Based on a Convolutional Auto-Encoder and Convolutional Neural Network in Object-Oriented Remote Sensing Classification , 2018, Algorithms.

[56]  Mohan Trivedi,et al.  Segmentation of a Thematic Mapper Image Using the Fuzzy c-Means Clusterng Algorthm , 1986, IEEE Transactions on Geoscience and Remote Sensing.

[57]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[58]  Oliver Schulte,et al.  Image Caption Generation with Hierarchical Contextual Visual Spatial Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[59]  Cong Lin,et al.  Integrating Multilayer Features of Convolutional Neural Networks for Remote Sensing Scene Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[60]  Bei Zhao,et al.  Scene classification based on a hierarchical convolutional sparse auto-encoder for high spatial resolution imagery , 2017 .

[61]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Lei Guo,et al.  Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.