ESPC_NASUnet: An End-to-End Super-Resolution Semantic Segmentation Network for Mapping Buildings From Remote Sensing Images

Higher resolution building mapping from lower resolution remote sensing images is in great demand due to the lack of higher resolution data access, especially in the context of disaster assessment. High resolution building layout map is crucial for emergency rescue after the disaster. The emergency response time would be reduced if detailed building footprints were delineated from more easily available low-resolution data. To achieve this goal, we propose a super-resolution semantic segmentation network called ESPC_NASUnet, which consists of a feature super-resolution module and a semantic segmentation module. To the best of authors’ knowledge, this is the first work to systematically explore a deep learning-based approach to generate semantic maps with higher spatial resolution from lower spatial resolution remote sensing images in an end-to-end fashion. The experimental results for two datasets suggest that the proposed network is the best among four different end-to-end architectures in terms of both pixel-level metrics and object-level metrics. In terms of pixel-level $F$1-score, the improvements are greater than 0.068 and 0.055. Regarding the object-level $F$1-score, the disparities between ESPC_NASUnet and other end-to-end methods are more than 0.083 and 0.161 in the two datasets, respectively. Compared with stage-wise methods, our end-to-end network is less impacted by low-resolution input images. Finally, the proposed network produces building semantic maps comparable to those generated by semantic segmentation networks trained with high-resolution images and the ground truth utilizing the two datasets.

[1]  Li Wang,et al.  Dual Super-Resolution Learning for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Qi Bing-juan,et al.  An Overview on Theory and Algorithm of Support Vector Machines , 2011 .

[3]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Aurélien Ducournau,et al.  Deep learning for ocean remote sensing: an application of convolutional neural networks for super-resolution on satellite-derived SST data , 2016, 2016 9th IAPR Workshop on Pattern Recogniton in Remote Sensing (PRRS).

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[7]  Nassir Navab,et al.  Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks , 2018, MICCAI.

[8]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[9]  Tao Zhang,et al.  A Comprehensive Evaluation of Approaches for Built-Up Area Extraction from Landsat OLI Images Using Massive Samples , 2018, Remote. Sens..

[10]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[11]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Curt H. Davis,et al.  Automated Building Extraction from High-Resolution Satellite Imagery in Urban Areas Using Structural, Contextual, and Spectral Information , 2005, EURASIP J. Adv. Signal Process..

[13]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  R. Keys Cubic convolution interpolation for digital image processing , 1981 .

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Hong Tang,et al.  GeoBoost: An Incremental Deep Learning Approach toward Global Mapping of Buildings from VHR Remote Sensing Images , 2020, Remote. Sens..

[17]  Tong Tong,et al.  Image Super-Resolution Using Dense Skip Connections , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Martino Pesaresi,et al.  A new compact representation of morphological profiles: report on first massive VHR image processing at the JRC , 2012, Defense + Commercial Sensing.

[19]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Yongyang Xu,et al.  Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters , 2018, Remote. Sens..

[21]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[22]  Kuiyuan Yang,et al.  GFF: Gated Fully Fusion for Semantic Segmentation , 2019, AAAI.

[23]  Xin Huang,et al.  A Multidirectional and Multiscale Morphological Index for Automatic Building Extraction from Multispectral GeoEye-1 Imagery , 2011 .

[24]  Tian Zhao,et al.  Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network , 2019, Remote. Sens..

[25]  Jian Sun,et al.  ExFuse: Enhancing Feature Fusion for Semantic Segmentation , 2018, ECCV.

[26]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[27]  Jefersson Alex dos Santos,et al.  An End-To-End Framework For Low-Resolution Remote Sensing Semantic Segmentation , 2020, 2020 IEEE Latin American GRSS & ISPRS Remote Sensing Conference (LAGIRS).

[28]  Jamie Sherrah,et al.  Effective semantic pixel labelling with convolutional networks and Conditional Random Fields , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Yu Liu,et al.  Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery , 2017, Remote. Sens..

[30]  Zhenwei Shi,et al.  Super-Resolution for Remote Sensing Images via Local–Global Combined Network , 2017, IEEE Geoscience and Remote Sensing Letters.

[31]  Antonio Plaza,et al.  Remote Sensing Image Superresolution Using Deep Residual Channel Attention , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  R. Avudaiammal,et al.  Extraction of Buildings in Urban Area for Surface Area Assessment from Satellite Imagery based on Morphological Building Index using SVM Classifier , 2020, Journal of the Indian Society of Remote Sensing.

[35]  Xiaoping Liu,et al.  High-resolution multi-temporal mapping of global urban land using Landsat images based on the Google Earth Engine Platform , 2018 .

[36]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Wei Yuan,et al.  Super-Resolution Integrated Building Semantic Segmentation for Multi-Source Remote Sensing Imagery , 2019, IEEE Access.

[38]  Ankur Datta,et al.  Dense Bynet: Residual Dense Network for Image Super Resolution , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[39]  Jie Shan,et al.  A comprehensive review of earthquake-induced building damage detection with remote sensing techniques , 2013 .

[40]  Liangpei Zhang,et al.  Morphological Building/Shadow Index for Building Extraction From High-Resolution Imagery Over Urban Areas , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[41]  Heng Tao Shen,et al.  Remote Sensing Image Super-Resolution via Mixed High-Order Attention Network , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[42]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.