Panoptic Segmentation Meets Remote Sensing

Deep Learning (DL) methods achieved state-of-the-art results in remote sensing image segmentation studies with an increasing trend. Most studies focus on semantic and instance segmentation methods, with a research gap in panoptic segmentation. Panoptic segmentation combines instance and semantic predictions, allowing the detection of ”things” (countable objects) and ”stuff” (different backgrounds) simultaneously. Effectively approaching panoptic segmentation in remotely sensed data can be auspicious in many challenging problems since it allows continuous mapping and specific target counting. Several difficulties have prevented the growth of this task in remote sensing: (a) most algorithms are designed for traditional images, (b) image labeling must encompass ”things” and ”stuff” classes (being much more laborious), and (c) the annotation format is complex. Thus, aiming to solve and increase the operability of panoptic segmentation in remote sensing, this study has five objectives: (1) create a novel data preparation pipeline for the panoptic segmentation task using GIS tools, (2) propose a novel annotation conversion software to generate panoptic annotations in the COCO format automatically; (3) propose a novel dataset on urban areas, (4) modify and leverage the Detectron2 architecture and software for the task, and (5) evaluate semantic, instance, and panoptic metrics and present the difficulties of this task in the urban setting. We used an aerial image with a 0,24-meter spatial resolution in the city of Braśılia, covering an area of 79,401 m. The annotations considered fourteen classes (three ”stuff” and eleven ”thing” categories). Our proposed pipeline considers three image inputs (original image, semantic image, and panoptic image). The proposed software uses these inputs ∗Corresponding author: osmarjr@unb.br Email addresses: osmarcarvalho@ieee.org (Osmar Luiz Ferreira de Carvalho), cristiano@dubbox.org (Cristiano Rosa e Silva), anesmar@ieee.org (Anesmar Olino de Albuquerque), nickolas.santana@unb.br (Nickolas Castro Santana), dibio@unb.br (Dibio Leandro Borges), robertogomes@unb.br (Roberto Arnaldo Trancoso Gomes), renatofg@unb.br (Renato Fontes Guimarães) Preprint December 1, 2021 ar X iv :2 11 1. 12 12 6v 2 [ cs .C V ] 3 0 N ov 2 02 1 alongside point shapefiles, creating samples at the centroid of each point shapefile with their corresponding annotations in the COCO format. The usage of points allows the researchers to choose samples in critical areas. Our study generated 3,400 samples with 512x512 pixel dimensions (3,000 for training, 200 for validation, and 200 for testing). The analysis used the Panoptic-FPN model with two backbones (ResNet-50 and ResNet-101), and the model evaluation considered three metric types (semantic metrics, instance metrics, and panoptic metrics). Regarding the main metrics, we obtained 93.865, 47.691, and 64.979 for the mean Intersection over Union, box Average Precision, and Panoptic Quality, respectively. Our study presents the first effective pipeline for panoptic segmentation and an extensive database for other researchers to use and deal with other data or related problems requiring a thorough scene understanding.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  K. Moffett,et al.  Remote Sens , 2015 .

[3]  Rameen Abdal,et al.  UFCN: a fully convolutional neural network for road extraction in RGB imagery acquired by remote sensing from an unmanned aerial vehicle , 2018 .

[4]  Rohit Mohan,et al.  EfficientPS: Efficient Panoptic Segmentation , 2020, International Journal of Computer Vision.

[5]  Jon Atli Benediktsson,et al.  Deep TEC: Deep Transfer Learning with Ensemble Classifier for Road Extraction from UAV Imagery , 2020, Remote. Sens..

[6]  Qing Guo,et al.  A Self-Supervised Learning Framework for Road Centerline Extraction From High-Resolution Remote Sensing Images , 2020, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[7]  Bertrand Le Saux,et al.  Segment-before-Detect: Vehicle Detection and Classification through Semantic Segmentation of Aerial Images , 2017, Remote. Sens..

[8]  Pierre Alliez,et al.  Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark , 2017, 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS).

[9]  Timothy A. Warner,et al.  Implementation of machine-learning classification in remote sensing: an applied review , 2018 .

[10]  Yang Shao,et al.  Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points , 2012 .

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Claudia Kuenzer,et al.  Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review - Part II: Applications , 2020, Remote. Sens..

[13]  Xiao Xiang Zhu,et al.  On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances, and Million-AID , 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Hao He,et al.  Road Extraction by Using Atrous Spatial Pyramid Pooling Integrated Encoder-Decoder Network and Structural Similarity Loss , 2019, Remote. Sens..

[16]  Naif Alajlan,et al.  Deep Learning Approach for Car Detection in UAV Imagery , 2017, Remote. Sens..

[17]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Qiangqiang Wu,et al.  Automatic Road Extraction from High-Resolution Remote Sensing Images Using a Method Based on Densely Connected Spatial Feature-Enhanced Pyramid , 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[19]  Yang Chen,et al.  Extraction of Urban Water Bodies from High-Resolution Remote-Sensing Imagery Using Deep Learning , 2018 .

[20]  Biswajeet Pradhan,et al.  Building Footprint Extraction from High Resolution Aerial Images Using Generative Adversarial Network (GAN) Architecture , 2020, IEEE Access.

[21]  Wei Jiang,et al.  A Multi-Scale Water Extraction Convolutional Neural Network (MWEN) Method for GaoFen-1 Remote Sensing Images , 2020, ISPRS Int. J. Geo Inf..

[22]  R. Santhosh Kumar,et al.  Deep Learning Model , 2019, Data Science.

[23]  Lingkui Meng,et al.  LabelRS: An Automated Toolbox to Make Deep Learning Samples from Remote Sensing Images , 2021, Remote. Sens..

[24]  Díbio Leandro Borges,et al.  Instance Segmentation for Large, Multi-Channel Remote Sensing Imagery Using Mask-RCNN and a Mosaicking Approach , 2020, Remote. Sens..

[25]  Xiao Xiang Zhu,et al.  Vehicle Instance Segmentation From Aerial Image and Video Using a Multitask Learning Residual Fully Convolutional Network , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[26]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Lanqing Huang,et al.  OpenSARShip: A Dataset Dedicated to Sentinel-1 Ship Interpretation , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[29]  Xiao Xiang Zhu,et al.  Semantic segmentation of slums in satellite images using transfer learning on fully convolutional neural networks , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[30]  R. Shah-Hosseini,et al.  Building panoptic change segmentation with the use of uncertainty estimation in squeeze-and-attention CNN and remote sensing observations , 2021, International Journal of Remote Sensing.

[31]  Xiangyun Hu,et al.  PGA-SiamNet: Pyramid Feature-Based Attention-Guided Siamese Network for Remote Sensing Orthoimagery Building Change Detection , 2020, Remote. Sens..

[32]  Pankaj Bodani,et al.  Automatic building footprint extraction from very high-resolution imagery using deep learning techniques , 2020, Geocarto International.

[33]  Sukhendu Das,et al.  Use of Salient Features for the Design of a Multistage Framework to Extract Roads From High-Resolution Multispectral Satellite Images , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[34]  Josiane Zerubia,et al.  Building Development Monitoring in Multitemporal Remotely Sensed Image Pairs with Stochastic Birth-Death Dynamics , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Xiaohui Yuan,et al.  A review of deep learning methods for semantic segmentation of remote sensing imagery , 2021, Expert Syst. Appl..

[36]  Renbao Lian,et al.  DeepWindow: Sliding Window Based on Deep Learning for Road Extraction From Remote Sensing Images , 2020, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[37]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Xueliang Zhang,et al.  Deep learning in remote sensing applications: A meta-analysis and review , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[39]  Adam Van Etten,et al.  SpaceNet: A Remote Sensing Dataset and Challenge Series , 2018, ArXiv.

[40]  Peng Liu,et al.  Semantic Segmentation for Buildings of Large Intra-Class Variation in Remote Sensing Images with O-GAN , 2021, Remote. Sens..

[41]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[42]  Chang Liu,et al.  Linear Span Network for Object Skeleton Detection , 2018, ECCV.

[43]  Gyanendra K. Verma,et al.  Convolutional neural network: a review of models, methodologies and applications to object detection , 2019, Progress in Artificial Intelligence.

[44]  David J. Griffiths,et al.  Improving public data for building segmentation from Convolutional Neural Networks (CNNs) for fused airborne lidar and image data using active contours , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[45]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Min Xia,et al.  Water Areas Segmentation from Remote Sensing Images Using a Separable Residual SegNet Network , 2020, ISPRS Int. J. Geo Inf..

[47]  Dong Liu,et al.  Fully Convolutional Adaptation Networks for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Hao Su,et al.  HRSID: A High-Resolution SAR Images Dataset for Ship Detection and Instance Segmentation , 2020, IEEE Access.

[49]  Jefersson Alex dos Santos,et al.  Towards better exploiting convolutional neural networks for remote sensing scene classification , 2016, Pattern Recognit..

[50]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[51]  Dezhong Peng,et al.  UAVData: A dataset for unmanned aerial vehicle detection , 2021, Soft Comput..

[52]  Li Yan,et al.  An Automatic Shadow Detection Method for VHR Remote Sensing Orthoimagery , 2017, Remote. Sens..

[53]  Miro Govedarica,et al.  A Deep Learning Model for Automatic Plastic Mapping Using Unmanned Aerial Vehicle (UAV) Data , 2020, Remote. Sens..

[54]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[55]  Ling Shao,et al.  iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images , 2019, CVPR Workshops.

[56]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[57]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  VehSat: a Large-Scale Dataset for Vehicle Detection in Satellite Images , 2020, IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium.

[59]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Mehdi Mokhtarzade,et al.  Road detection from high-resolution satellite images using artificial neural networks , 2007, Int. J. Appl. Earth Obs. Geoinformation.

[61]  Vivien Sainte Fare Garnot,et al.  Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks , 2021, ArXiv.

[62]  Aleksandar Milosavljevic,et al.  Automated Processing of Remote Sensing Imagery Using Deep Semantic Segmentation: A Building Footprint Extraction Case , 2020, ISPRS Int. J. Geo Inf..

[63]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[64]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[65]  Qian Song,et al.  FUSAR-Ship: building a high-resolution SAR-AIS matchup dataset of Gaofen-3 for ship detection and recognition , 2020, Science China Information Sciences.

[66]  Yufeng Wang,et al.  ERN: Edge Loss Reinforced Semantic Segmentation Network for Remote Sensing Images , 2018, Remote. Sens..

[67]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[68]  Tian Zhao,et al.  Semantic Segmentation of Urban Buildings from VHR Remote Sensing Imagery Using a Deep Convolutional Neural Network , 2019, Remote. Sens..

[69]  Kai Zhao,et al.  Res2Net: A New Multi-Scale Backbone Architecture , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Huei-Yung Lin,et al.  VAID: An Aerial Image Dataset for Vehicle Detection and Classification , 2020, IEEE Access.

[71]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[72]  Yongyang Xu,et al.  Road Extraction from High-Resolution Remote Sensing Imagery Using Deep Learning , 2018, Remote. Sens..

[73]  Antonio Torralba,et al.  LabelMe: Online Image Annotation and Applications , 2010, Proceedings of the IEEE.

[74]  Yongchao Gong,et al.  Mask Scoring R-CNN , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Ting Rui,et al.  Cascaded panoptic segmentation method for high resolution remote sensing image , 2021, Appl. Soft Comput..

[76]  Evgeny Burnaev,et al.  Boundary Loss for Remote Sensing Imagery Semantic Segmentation , 2019, ISNN.

[77]  Yong Jae Lee,et al.  YOLACT++ Better Real-Time Instance Segmentation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  Meng Lu,et al.  Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[79]  Ting Wang,et al.  Improving Impervious Surface Extraction With Shadow-Based Sparse Representation From Optical, SAR, and LiDAR Data , 2019, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[80]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[81]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[82]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[83]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[84]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[85]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..