论文信息 - Pop-Net: Encoder-Dual Decoder for Semantic Segmentation and Single-View Height Estimation

Pop-Net: Encoder-Dual Decoder for Semantic Segmentation and Single-View Height Estimation

The single-view semantic 3D challenge in 2019 Data Fusion Contest is to predict both semantic labels and normalized digital surface model (nDSM) for urban scenes from single-view satellite images. We propose a novel pyramid on pyramid network (Pop-Net) based on Encoder-Dual Decoder framework to end-to-end multi-task learning. The encoder is a deformable ResNet-101 backbone network. Two feature pyramid networks, as decoders, are responsible for semantic segmentation and height estimation, respectively. Semantic information is crucial to estimate height. Therefore, regression pyramid on the semantic pyramid is introduced to leverage semantic features to help height estimation. To deal with outliers in heights, we leverage anchor-based regression and smooth L1 loss for optimization to obtain more robust height estimation. Without bells and whistles, our single model entry achieves 77.78% mIoU and 53.40% mIoU-3 on test set, ranking 2nd in the Single-view Semantic 3D Challenge of the 2019 IEEE GRSS Data Fusion Contest. The code is available at https://github.com/Z-Zheng/PopNet.

[1] Xiao Xiang Zhu,et al. IM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully Residual Convolutional-Deconvolutional Network , 2018, ArXiv.

[2] Gregory D. Hager,et al. Semantic Stereo for Incidental Satellite Images , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3] Michele Volpi,et al. Joint height estimation and semantic labeling of monocular aerial images with CNNS , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[4] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Naoto Yokoya,et al. 2019 Data Fusion Contest [Technical Committees] , 2019, IEEE Geoscience and Remote Sensing Magazine.

[8] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.