LOANet: a lightweight network using object attention for extracting buildings and roads from UAV aerial remote sensing images

Semantic segmentation for extracting buildings and roads from uncrewed aerial vehicle (UAV) remote sensing images by deep learning becomes a more efficient and convenient method than traditional manual segmentation in surveying and mapping fields. In order to make the model lightweight and improve the model accuracy, a lightweight network using object attention (LOANet) for buildings and roads from UAV aerial remote sensing images is proposed. The proposed network adopts an encoder-decoder architecture in which a lightweight densely connected network (LDCNet) is developed as the encoder. In the decoder part, the dual multi-scale context modules which consist of the atrous spatial pyramid pooling module (ASPP) and the object attention module (OAM) are designed to capture more context information from feature maps of UAV remote sensing images. Between ASPP and OAM, a feature pyramid network (FPN) module is used to fuse multi-scale features extracted from ASPP. A private dataset of remote sensing images taken by UAV which contains 2431 training sets, 945 validation sets, and 475 test sets is constructed. The proposed basic model performs well on this dataset, with only 1.4M parameters and 5.48G floating point operations (FLOPs), achieving excellent mean Intersection-over-Union (mIoU). Further experiments on the publicly available LoveDA and CITY-OSM datasets have been conducted to further validate the effectiveness of the proposed basic and large model, and outstanding mIoU results have been achieved. All codes are available on https://github.com/GtLinyer/LOANet.

[1]  Dongmei Chen,et al.  A hybrid image segmentation method for building extraction from high-resolution RGB images , 2022, ISPRS Journal of Photogrammetry and Remote Sensing.

[2]  Changwen Xu,et al.  Swin Transformer Based on Two-Fold Loss and Background Adaptation Re-Ranking for Person Re-Identification , 2022, Electronics.

[3]  Jian Sun,et al.  Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yongtao Yu,et al.  Road marking extraction in UAV imagery using attentive capsule feature pyramid network , 2022, Int. J. Appl. Earth Obs. Geoinformation.

[5]  Chengrou Lu,et al.  Visual attention network , 2022, Computational Visual Media.

[6]  F. Sultonov,et al.  Mixer U-Net: An Improved Automatic Road Extraction from UAV Imagery , 2022, Applied Sciences.

[7]  Shunyi Zheng,et al.  A2-FPN for semantic segmentation of fine-resolution remotely sensed images , 2022, International Journal of Remote Sensing.

[8]  G. C. Alexandropoulos,et al.  DDU-Net: Dual-Decoder-U-Net for Road Extraction Using High-Resolution Remote Sensing Images , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Trevor Darrell,et al.  A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shenghui Fang,et al.  Building extraction with vision transformer , 2021, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Zhuo Zheng,et al.  LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation , 2021, NeurIPS Datasets and Benchmarks.

[12]  Lu Yuan,et al.  MicroNet: Improving Image Recognition with Extremely Low FLOPs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[14]  Justin Johnson,et al.  Rethinking "Batch" in BatchNorm , 2021, ArXiv.

[15]  Ce Zhang,et al.  A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images , 2021, IEEE Geoscience and Remote Sensing Letters.

[16]  L. Jorge,et al.  A Review on Deep Learning in UAV Remote Sensing , 2021, Int. J. Appl. Earth Obs. Geoinformation.

[17]  Yumin Tan,et al.  Deep learning-based multi-feature semantic segmentation in building extraction from images of UAV photogrammetry , 2021 .

[18]  Rui Li,et al.  Multiattention Network for Semantic Segmentation of Fine-Resolution Remote Sensing Images , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Min Xia,et al.  Water Areas Segmentation from Remote Sensing Images Using a Separable Residual SegNet Network , 2020, ISPRS Int. J. Geo Inf..

[20]  Gui-zhou Wang,et al.  Research on a novel extraction method using Deep Learning based on GF-2 images for aquaculture areas , 2020, International Journal of Remote Sensing.

[21]  Yuan Liu,et al.  Intelligent Object Recognition of Urban Water Bodies Based on Deep Learning for Multi-Source and Multi-Temporal High Spatial Resolution Remote Sensing Imagery , 2020, Sensors.

[22]  Wei Liu,et al.  Accurate Building Extraction from Fused DSM and UAV Images Using a Chain Fully Convolutional Neural Network , 2019, Remote. Sens..

[23]  Chang Xu,et al.  GhostNet: More Features From Cheap Operations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Vladimir V. Kniaz,et al.  Deep learning for dense labeling of hydrographic regions in very high resolution imagery , 2019, Remote Sensing.

[25]  Lei He,et al.  Road Extraction from Unmanned Aerial Vehicle Remote Sensing Images Based on Improved Neural Networks , 2019, Sensors.

[26]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[27]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[29]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[30]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[32]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[33]  Wei Li,et al.  DeepUNet: A Deep Fully Convolutional Network for Pixel-Level Sea-Land Segmentation , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[34]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Thomas Hofmann,et al.  Learning Aerial Image Segmentation From Online Maps , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[36]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[37]  Yu Liu,et al.  Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery , 2017, Remote. Sens..

[38]  Gang Fu,et al.  Classification for High Resolution Remote Sensing Imagery Using a Fully Convolutional Network , 2017, Remote. Sens..

[39]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[40]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Sergey Ioffe,et al.  Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.

[42]  Pierre Alliez,et al.  Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[43]  Serge J. Belongie,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[48]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[50]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[53]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[54]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Aaron C. Courville,et al.  Generative adversarial networks , 2014, Commun. ACM.

[57]  Yuan Zhang,et al.  Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-high-resolution Remote Sensing Imagery , 2022, IEEE Transactions on Geoscience and Remote Sensing.

[58]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Geoffrey E. Hinton,et al.  Machine Learning for Aerial Image Labeling , 2013 .