Segmentation of VHR EO Images using Unsupervised Learning

Semantic segmentation is a crucial step in many Earth observation tasks. Large quantity of pixel-level annotation is required to train deep networks for semantic segmentation. Earth observation techniques are applied to varieties of applications and since classes vary widely depending on the applications, therefore, domain knowledge is often required to label Earth observation images, impeding availability of labeled training data in many Earth observation applications. To tackle these challenges, in this paper we propose an unsupervised semantic segmentation method that can be trained using just a single unlabeled scene. Remote sensing scenes are generally large. The proposed method exploits this property to sample smaller patches from the larger scene and uses deep clustering and contrastive learning to refine the weights of a lightweight deep model composed of a series of the convolution layers along with an embedded channel attention. After unsupervised training on the target image/scene, the model automatically segregates the major classes present in the scene and produces the segmentation map. Experimental results on the Vaihingen dataset demonstrate the efficacy of the proposed method.

[1]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Cordelia Schmid,et al.  What makes for good views for contrastive learning , 2020, NeurIPS.

[3]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Xiao Xiang Zhu,et al.  Unsupervised Deep Joint Segmentation of Multitemporal High-Resolution Images , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[6]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  En Zhu,et al.  Deep Clustering with Convolutional Autoencoders , 2017, ICONIP.

[8]  Pierre Alliez,et al.  High-Resolution Aerial Image Labeling With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[9]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[10]  Jamie Sherrah,et al.  Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery , 2016, ArXiv.

[11]  Michele Volpi,et al.  Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[12]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  Francesca Bovolo,et al.  Change Detection in Image Time-Series Using Unsupervised LSTM , 2020, IEEE Geoscience and Remote Sensing Letters.

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[16]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Uwe Stilla,et al.  Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection , 2016, ISPRS Journal of Photogrammetry and Remote Sensing.

[18]  Xiao Xiang Zhu,et al.  Semantic Segmentation of Remote Sensing Images With Sparse Annotations , 2021, IEEE Geoscience and Remote Sensing Letters.

[19]  Biplab Banerjee,et al.  Semantic Guided Deep Unsupervised Image Segmentation , 2019, ICIAP.

[20]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[21]  Xiao Xiang Zhu,et al.  Relation Matters: Relational Context-Aware Fully Convolutional Network for Semantic Segmentation of High-Resolution Aerial Images , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[23]  Sudipan Saha,et al.  Building Change Detection in VHR SAR Images via Unsupervised Deep Transcoding , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[24]  Lorenzo Bruzzone,et al.  Semantic Segmentation of Large-Size VHR Remote Sensing Images Using a Two-Stage Multiscale Training Architecture , 2020, IEEE Transactions on Geoscience and Remote Sensing.