Multi-Object Segmentation in Complex Urban Scenes from High-Resolution Remote Sensing Data

Terrestrial features extraction, such as roads and buildings from aerial images using an automatic system, has many usages in an extensive range of fields, including disaster management, change detection, land cover assessment, and urban planning. This task is commonly tough because of complex scenes, such as urban scenes, where buildings and road objects are surrounded by shadows, vehicles, trees, etc., which appear in heterogeneous forms with lower inter-class and higher intra-class contrasts. Moreover, such extraction is time-consuming and expensive to perform by human specialists manually. Deep convolutional models have displayed considerable performance for feature segmentation from remote sensing data in the recent years. However, for the large and continuous area of obstructions, most of these techniques still cannot detect road and building well. Hence, this work’s principal goal is to introduce two novel deep convolutional models based on UNet family for multi-object segmentation, such as roads and buildings from aerial imagery. We focused on buildings and road networks because these objects constitute a huge part of the urban areas. The presented models are called multi-level context gating UNet (MCG-UNet) and bi-directional ConvLSTM UNet model (BCL-UNet). The proposed methods have the same advantages as the UNet model, the mechanism of densely connected convolutions, bi-directional ConvLSTM, and squeeze and excitation module to produce the segmentation maps with a high resolution and maintain the boundary information even under complicated backgrounds. Additionally, we implemented a basic efficient loss function called boundary-aware loss (BAL) that allowed a network to concentrate on hard semantic segmentation regions, such as overlapping areas, small objects, sophisticated objects, and boundaries of objects, and produce high-quality segmentation maps. The presented networks were tested on the Massachusetts building and road datasets. The MCG-UNet improved the average F1 accuracy by 1.85%, and 1.19% and 6.67% and 5.11% compared with UNet and BCL-UNet for road and building extraction, respectively. Additionally, the presented MCG-UNet and BCL-UNet networks were compared with other state-of-the-art deep learning-based networks, and the results proved the superiority of the networks in multi-object segmentation tasks.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Eugenio Culurciello,et al.  LinkNet: Exploiting encoder representations for efficient semantic segmentation , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[3]  Shiyong Cui,et al.  BUILDING EXTRACTION FROM REMOTE SENSING DATA USING FULLY CONVOLUTIONAL NETWORKS , 2017 .

[4]  Qingjie Liu,et al.  Road Extraction by Deep Residual U-Net , 2017, IEEE Geoscience and Remote Sensing Letters.

[5]  Zhenfeng Shao,et al.  BRRNet: A Fully Convolutional Neural Network for Automatic Building Extraction From High-Resolution Remote Sensing Images , 2020, Remote. Sens..

[6]  Biswajeet Pradhan,et al.  RoadVecNet: a new approach for simultaneous road network segmentation and vectorization from aerial and google earth imagery in a complex urban set-up , 2021, GIScience & Remote Sensing.

[7]  Wei Lee Woon,et al.  Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks , 2017 .

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yongyang Xu,et al.  DA-RoadNet: A Dual-Attention Network for Road Extraction From High Resolution Satellite Imagery , 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[10]  Jamie Sherrah,et al.  Effective semantic pixel labelling with convolutional networks and Conditional Random Fields , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Biswajeet Pradhan,et al.  Building Footprint Extraction from High Resolution Aerial Images Using Generative Adversarial Network (GAN) Architecture , 2020, IEEE Access.

[12]  Jing Huang,et al.  DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Jun Li,et al.  Attention-Gate-Based Encoder–Decoder Network for Automatical Building Extraction , 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[15]  Amy Loutfi,et al.  Classification and Segmentation of Satellite Orthoimagery Using Convolutional Neural Networks , 2016, Remote. Sens..

[16]  Yunhong Wang,et al.  JointNet: A Common Neural Network for Road and Building Extraction , 2019, Remote. Sens..

[17]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[18]  Alexey Shvets,et al.  TernausNetV2: Fully Convolutional Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Alexey Shvets,et al.  Fully Convolutional Network for Automatic Road Extraction from Satellite Imagery , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Yongyang Xu,et al.  Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters , 2018, Remote. Sens..

[21]  Biswajeet Pradhan,et al.  VNet: An End-to-End Fully Convolutional Neural Network for Road Extraction From High-Resolution Remote Sensing Data , 2020, IEEE Access.

[22]  Ming Wu,et al.  D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Biswajeet Pradhan,et al.  Urban Vegetation Mapping from Aerial Imagery Using Explainable AI (XAI) , 2021, Sensors.

[24]  Deep boundary‐aware semantic image segmentation , 2021, Comput. Animat. Virtual Worlds.

[25]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[26]  David M. Chan,et al.  A Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery , 2021, ArXiv.

[27]  Neelanshi Varia,et al.  DeepExt: A Convolution Neural Network for Road Extraction using RGB images captured by UAV , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[28]  Geoffrey E. Hinton,et al.  Machine Learning for Aerial Image Labeling , 2013 .

[29]  Jian-Jiun Ding,et al.  Accurate Road Detection from Satellite Images Using Modified U-net , 2018, 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS).

[30]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Biswajeet Pradhan,et al.  An ensemble architecture of deep convolutional Segnet and Unet networks for building semantic segmentation from high-resolution aerial images , 2020, Geocarto International.

[33]  Jian Yao,et al.  RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes From High-Resolution Remotely Sensed Images , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Xinchang Zhang,et al.  Road Extraction of High-Resolution Remote Sensing Images Derived from DenseUNet , 2019, Remote. Sens..

[35]  Leonardo Vanneschi,et al.  Improved Fully Convolutional Network with Conditional Random Fields for Building Extraction , 2018, Remote. Sens..

[36]  Sanyuan Zhao,et al.  Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[37]  Rameen Abdal,et al.  UFCN: a fully convolutional neural network for road extraction in RGB imagery acquired by remote sensing from an unmanned aerial vehicle , 2018 .

[38]  Dongfang Yang,et al.  Road Extraction from Remote Sensing Images Using the Inner Convolution Integrated Encoder-Decoder Network and Directional Conditional Random Fields , 2021, Remote. Sens..

[39]  Nikos Komodakis,et al.  Building detection in very high resolution multispectral data with deep learning features , 2015, 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[40]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Liujuan Cao,et al.  Deep neural networks-based vehicle detection in satellite images , 2015, 2015 International Symposium on Bioelectronics and Bioinformatics (ISBB).

[42]  Pierre Alliez,et al.  Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  George Pavlidis,et al.  Multispectral aerial imagery-based 3D digitisation, segmentation and annotation of large scale urban areas of significant cultural value , 2021 .

[45]  Jindi Wang,et al.  BT-RoadNet: A boundary and topologically-aware neural network for road extraction from high-resolution remote sensing imagery , 2020 .

[46]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Yongyang Xu,et al.  Building Extraction Based on U-Net with an Attention Block and Multiple Losses , 2020, Remote. Sens..

[48]  Hongmin Gao,et al.  An Efficient Building Extraction Method from High Spatial Resolution Remote Sensing Images Based on Improved Mask R-CNN , 2020, Sensors.

[49]  Yang Chen,et al.  Object-based multi-modal convolution neural networks for building extraction using panchromatic and multispectral imagery , 2020, Neurocomputing.

[50]  Yifan Wu,et al.  Aerial Imagery for Roof Segmentation: A Large-Scale Dataset towards Automatic Mapping of Buildings , 2018, ArXiv.

[51]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Biswajeet Pradhan,et al.  Integrated technique of segmentation and classification methods with connected components analysis for road extraction from orthophoto images , 2021, Expert Syst. Appl..

[53]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[55]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[56]  Yongyang Xu,et al.  A Research on Extracting Road Network from High Resolution Remote Sensing Imagery , 2018, 2018 26th International Conference on Geoinformatics.

[57]  Chunhong Pan,et al.  Building extraction from multi-source remote sensing images via deep deconvolution neural networks , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[58]  Teerayut Horanont,et al.  Deep Learning-Based Semantic Segmentation of Urban Features in Satellite Images: A Review and Meta-Analysis , 2021, Remote. Sens..

[59]  Nikolaos Doulamis,et al.  Stacked Autoencoders Driven by Semi-Supervised Learning for Building Extraction from near Infrared Remote Sensing Imagery , 2021, Remote. Sens..

[60]  Takayoshi Yamashita,et al.  Multiple Object Extraction from Aerial Imagery with Convolutional Neural Networks , 2016, IRIACV.

[61]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[62]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Marleen de Bruijne,et al.  Transfer Learning Improves Supervised Image Segmentation Across Imaging Protocols , 2015, IEEE Trans. Medical Imaging.

[64]  Joachim Denzler,et al.  Efficient Convolutional Patch Networks for Scene Understanding , 2015 .

[65]  Biao Wang,et al.  Building Extraction in Very High Resolution Imagery by Dense-Attention Networks , 2018, Remote. Sens..

[66]  Jamie Sherrah,et al.  Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery , 2016, ArXiv.

[67]  Mahmood Fathy,et al.  Multi-level Context Gating of Embedded Collective Knowledge for Medical Image Segmentation , 2020, ArXiv.