论文信息 - Web-Net: A Novel Nest Networks with Ultra-Hierarchical Sampling for Building Extraction from Aerial Imageries

Web-Net: A Novel Nest Networks with Ultra-Hierarchical Sampling for Building Extraction from Aerial Imageries

How to efficiently utilize vast amounts of easily accessed aerial imageries is a critical challenge for researchers with the proliferation of high-resolution remote sensing sensors and platforms. Recently, the rapid development of deep neural networks (DNN) has been a focus in remote sensing, and the networks have achieved remarkable progress in image classification and segmentation tasks. However, the current DNN models inevitably lose the local cues during the downsampling operation. Additionally, even with skip connections, the upsampling methods cannot properly recover the structural information, such as the edge intersections, parallelism, and symmetry. In this paper, we propose the Web-Net, which is a nested network architecture with hierarchical dense connections, to handle these issues. We design the Ultra-Hierarchical Sampling (UHS) block to absorb and fuse the inter-level feature maps to propagate the feature maps among different levels. The position-wise downsampling/upsampling methods in the UHS iteratively change the shape of the inputs while preserving the number of their parameters, so that the low-level local cues and high-level semantic cues are properly preserved. We verify the effectiveness of the proposed Web-Net in the Inria Aerial Dataset and WHU Dataset. The results of the proposed Web-Net achieve an overall accuracy of 96.97% and an IoU (Intersection over Union) of 80.10% on the Inria Aerial Dataset, which surpasses the state-of-the-art SegNet 1.8% and 9.96%, respectively; the results on the WHU Dataset also support the effectiveness of the proposed Web-Net. Additionally, benefitting from the nested network architecture and the UHS block, the extracted buildings on the prediction maps are obviously sharper and more accurately identified, and even the building areas that are covered by shadows can also be correctly extracted. The verified results indicate that the proposed Web-Net is both effective and efficient for building extraction from high-resolution remote sensing images.

[1] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Alain Trémeau,et al. Residual Conv-Deconv Grid Network for Semantic Segmentation , 2017, BMVC.

[3] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Nima Tajbakhsh,et al. UNet++: A Nested U-Net Architecture for Medical Image Segmentation , 2018, DLMIA/ML-CDS@MICCAI.

[5] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[6] Zhiyong Lv,et al. Method Based on Edge Constraint and Fast Marching for Road Centerline Extraction from Very High-Resolution Remote Sensing Images , 2018, Remote. Sens..

[7] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[8] Parvaneh Saeedi,et al. Automatic Rooftop Extraction in Nadir Aerial Imagery of Suburban Regions Using Corners and Variational Level Set Evolution , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[9] Bertrand Le Saux,et al. Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images , 2017, Remote. Sens..

[10] Xiao Xiang Zhu,et al. RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images , 2018, ArXiv.

[11] Bastian Leibe,et al. Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Chunhong Pan,et al. Building extraction from multi-source remote sensing images via deep deconvolution neural networks , 2016, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[14] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[15] Garrison W. Cottrell,et al. Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16] Jungho Im,et al. Support vector machines in remote sensing: A review , 2011 .

[17] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Shawn D. Newsam,et al. Comparing SIFT descriptors and gabor texture features for classification of remote sensed imagery , 2008, 2008 15th IEEE International Conference on Image Processing.

[20] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Pierre Alliez,et al. Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[22] Sim Heng Ong,et al. Dual-Resolution U-Net: Building Extraction from Aerial Images , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[23] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Andreas Dengel,et al. Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks , 2017, 2019 IEEE International Conference on Image Processing (ICIP).

[25] Fatos T. Yarman-Vural,et al. Building Detection With Decision Fusion , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27] Uwe Stilla,et al. Airborne Vehicle Detection in Dense Urban Areas Using HoG Features and Disparity Maps , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[28] Wei Lee Woon,et al. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks , 2017 .

[29] Ramakant Nevatia,et al. Improved Rooftop Detection in Aerial Images with Machine Learning , 2003, Machine Learning.

[30] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Xiang Li,et al. Building-A-Nets: Robust Building Extraction From High-Resolution Remote Sensing Images With Adversarial Networks , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[33] Xiao Xiang Zhu,et al. Deep Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[34] Yongyang Xu,et al. Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters , 2018, Remote. Sens..

[35] Kilian Q. Weinberger,et al. Memory-Efficient Implementation of DenseNets , 2017, ArXiv.

[36] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Leonardo Vanneschi,et al. Improved Fully Convolutional Network with Conditional Random Fields for Building Extraction , 2018, Remote. Sens..

[38] Yifan Wu,et al. Aerial Imagery for Roof Segmentation: A Large-Scale Dataset towards Automatic Mapping of Buildings , 2018, ArXiv.

[39] Jiang Han,et al. Fully convolutional networks for building and road extraction: Preliminary results , 2016 .

[40] Shiyong Cui,et al. BUILDING EXTRACTION FROM REMOTE SENSING DATA USING FULLY CONVOLUTIONAL NETWORKS , 2017 .

[41] Meng Lu,et al. Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[42] Kaiming He,et al. Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43] Bertrand Le Saux,et al. Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[44] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[45] Yuhao Wang,et al. Dense Semantic Labeling with Atrous Spatial Pyramid Pooling and Decoder for High-Resolution Remote Sensing Imagery , 2018, Remote. Sens..

[46] Zhenwei Shi,et al. MugNet: Deep learning for hyperspectral image classification using limited samples , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[47] Wei Yuan,et al. Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks , 2018, Remote. Sens..

[48] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[49] Shiyong Cui,et al. Building Footprint Extraction From VHR Remote Sensing Images Combined With Normalized DSMs Using Fused Fully Convolutional Networks , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[50] Menglong Yan,et al. Semantic Segmentation of Aerial Images With Shuffling Convolutional Neural Networks , 2018, IEEE Geoscience and Remote Sensing Letters.

[51] Biao Wang,et al. Building Extraction in Very High Resolution Imagery by Dense-Attention Networks , 2018, Remote. Sens..

[52] Motaz El-Saban,et al. Automatic Pixelwise Object Labeling for Aerial Imagery Using Stacked U-Nets , 2018, ArXiv.

[53] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Zhuowen Tu,et al. Training Deeper Convolutional Networks with Deep Supervision , 2015, ArXiv.