RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images

Semantic segmentation in high resolution remote sensing images is a fundamental and challenging task. Convolutional neural networks (CNNs), such as fully convolutional network (FCN) and SegNet, have shown outstanding performance in many segmentation tasks. One key pillar of these successes is mining useful information from features in convolutional layers for producing high resolution segmentation maps. For example, FCN nonlinearly combines high-level features extracted from last convolutional layers; whereas SegNet utilizes a deconvolutional network which takes as input only coarse, high-level feature maps of the last convolutional layer. However, how to better fuse multi-level convolutional feature maps for semantic segmentation of remote sensing images is underexplored. In this work, we propose a novel bidirectional network called recurrent network in fully convolutional network (RiFCN), which is end-to-end trainable. It has a forward stream and a backward stream. The former is a classification CNN architecture for feature extraction, which takes an input image and produces multi-level convolutional feature maps from shallow to deep; while in the later, to achieve accurate boundary inference and semantic segmentation, boundary-aware high resolution feature maps in shallower layers and high-level but low-resolution features are recursively embedded into the learning framework (from deep to shallow) to generate a fused feature representation that draws a holistic picture of not only high-level semantic information but also low-level fine-grained details. Experimental results on two widely-used high resolution remote sensing data sets for semantic segmentation tasks, ISPRS Potsdam and Inria Aerial Image Labeling Data Set, demonstrate competitive performance obtained by the proposed methodology compared to other studied approaches.

[1]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[2]  Xiao Xiang Zhu,et al.  A CNN for the identification of corresponding patches in SAR and optical imagery of urban scenes , 2017, 2017 Joint Urban Remote Sensing Event (JURSE).

[3]  Jon Atli Benediktsson,et al.  Morphological Attribute Profiles for the Analysis of Very High Resolution Images , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[4]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[5]  Xiao Xiang Zhu,et al.  Deep Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[7]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[10]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[11]  Nikos Paragios,et al.  Multitemporal Very High Resolution From Space: Outcome of the 2016 IEEE GRSS Data Fusion Contest , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[12]  Xiao Xiang Zhu,et al.  Spatiotemporal scene interpretation of space videos via deep neural network and tracklet analysis , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[13]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Xiao Xiang Zhu,et al.  LAHNet: A Convolutional Neural Network Fusing Low- and High-Level Features for Aerial Scene Classification , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[18]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Xiaoqiang Lu,et al.  Scene Recognition by Manifold Regularized Deep Learning Architecture , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Rrio Op-amps FEATURES , 2008 .

[24]  Piotr Tokarczyk,et al.  Features, Color Spaces, and Boosting: New Insights on Semantic Classification of Remote Sensing Images , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Pierre Alliez,et al.  High-Resolution Aerial Image Labeling With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[26]  Lichao Mou,et al.  Learning a Transferable Change Rule from a Recurrent Neural Network for Land Cover Change Detection , 2016, Remote. Sens..

[27]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[28]  Xiao Xiang Zhu,et al.  Long-Term Annual Mapping of Four Cities on Different Continents by Applying a Deep Information Learning Method to Landsat Data , 2018, Remote. Sens..

[29]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[30]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Anton van den Hengel,et al.  High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks , 2016, ArXiv.

[32]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33]  Xiao Xiang Zhu,et al.  IM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully Residual Convolutional-Deconvolutional Network , 2018, ArXiv.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Xiao Xiang Zhu,et al.  Unsupervised Spectral–Spatial Feature Learning via Deep Residual Conv–Deconv Network for Hyperspectral Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[36]  Bertrand Le Saux,et al.  Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks , 2016, ACCV.

[37]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[38]  Marco Körner,et al.  Temporal Vegetation Modelling Using Long Short-Term Memory Networks for Crop Identification from Medium-Resolution Multi-spectral Satellite Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Andreas Dengel,et al.  Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks , 2017, 2019 IEEE International Conference on Image Processing (ICIP).

[40]  Bertrand Le Saux,et al.  Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images , 2017, Remote. Sens..

[41]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[44]  Marc Rußwurm,et al.  Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders , 2018, ISPRS Int. J. Geo Inf..

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Sébastien Ohleyer,et al.  Building segmentation on satellite images , 2018 .

[47]  Uwe Stilla,et al.  SEMANTIC SEGMENTATION OF AERIAL IMAGES WITH AN ENSEMBLE OF CNNS , 2016 .

[48]  Gellért Máttyus,et al.  Fast Multiclass Vehicle Detection on Aerial Images , 2015, IEEE Geoscience and Remote Sensing Letters.

[49]  Xiao Xiang Zhu,et al.  Learning Spectral-Spatial-Temporal Features via a Recurrent Convolutional Neural Network for Change Detection in Multispectral Imagery , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[50]  Bertrand Le Saux,et al.  Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[51]  Xiao Xiang Zhu,et al.  Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources , 2017, IEEE Geoscience and Remote Sensing Magazine.