Context-self contrastive pretraining for crop type semantic segmentation

In this paper we propose a fully-supervised pretraining scheme based on contrastive learning particularly tailored to dense classification tasks. The proposed Context-Self Contrastive Loss (CSCL) learns an embedding space that makes semantic boundaries pop-up by use of a similarity metric between every location in an training sample and its local context. For crop type semantic segmentation from satellite images we find performance at parcel boundaries to be a critical bottleneck and explain how CSCL tackles the underlying cause of that problem, improving the state-of0Imagery ©2020 Google, Imagery ©Maxar Technologies, Map data ©2020, Imagery Date: 10/11/2017. the-art performance in this task. Additionally, using images from the Sentinel-2 (S2) satellite missions we compile the largest, to our knowledge, dataset of satellite image timeseries densely annotated by crop type and parcel identities, which we make publicly available together with the data generation pipeline. Using that data we find CSCL, even with minimal pretraining, to improve all respective baselines and present a process for semantic segmentation at super-resolution for obtaining crop classes at a more granular level. The proposed method is further validated on the task of semantic segmentation on 2D and 3D volumetric images showing consistent performance improvements upon competitive baselines. 1 ar X iv :2 10 4. 04 31 0v 1 [ cs .C V ] 9 A pr 2 02 1

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Baofeng Su,et al.  Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications , 2017, J. Sensors.

[3]  Ser-Nam Lim,et al.  A Metric Learning Reality Check , 2020, ECCV.

[4]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Mariana Belgiu,et al.  Cropland mapping from Sentinel-2 time series data using object-based image analysis , 2017 .

[6]  Giorgos Mallinis,et al.  A Hidden Markov Models Approach for Crop Classification: Linking Crop Phenology to Time Series of Multi-Sensor Remote Sensing Data , 2015, Remote. Sens..

[7]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[8]  Marco Körner,et al.  Temporal Vegetation Modelling Using Long Short-Term Memory Networks for Crop Identification from Medium-Resolution Multi-spectral Satellite Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[10]  Gérard Dedieu,et al.  Assessing the robustness of Random Forests to map land cover with high resolution satellite image time series over large areas , 2016 .

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Dino Ienco,et al.  DuPLO: A DUal view Point deep Learning architecture for time series classificatiOn , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.

[13]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[14]  Tao Kong,et al.  Dense Contrastive Learning for Self-Supervised Visual Pre-Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Nesrine Chehata,et al.  Satellite Image Time Series Classification With Pixel-Set Encoders and Temporal Self-Attention , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Joon Son Chung,et al.  Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Stefano Ermon,et al.  Semantic Segmentation of Crop Type in Africa: A Novel Dataset and Analysis of Deep Learning Methods , 2019, CVPR Workshops.

[18]  Christos Davatzikos,et al.  Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features , 2017, Scientific Data.

[19]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Marc Russwurm,et al.  BREIZHCROPS: A TIME SERIES DATASET FOR CROP TYPE MAPPING , 2019, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences.

[21]  Vivien Sainte Fare Garnot,et al.  Panoptic Segmentation of Satellite Image Time Series with Convolutional Temporal Attention Networks , 2021, ArXiv.

[22]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[23]  Andreas Kamilaris,et al.  Deep learning in agriculture: A survey , 2018, Comput. Electron. Agric..

[24]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Li Wang,et al.  Feature Selection of Time Series MODIS Data for Early Crop Classification Using Random Forest: A Case Study in Kansas, USA , 2015, Remote. Sens..

[27]  Nesrine Chehata,et al.  Time-Space Tradeoff in Deep Learning Models for Crop Classification on Satellite Multi-Spectral Image Time Series , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[28]  Olena Dubovyk,et al.  Derivation of temporal windows for accurate crop discrimination in heterogeneous croplands of Uzbekistan using multitemporal RapidEye images , 2014 .

[29]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[30]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[31]  et al.,et al.  Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge , 2018, ArXiv.

[32]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[34]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[35]  Geoffrey I. Webb,et al.  Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series , 2018, Remote. Sens..

[36]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[37]  Kaiming He,et al.  Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.

[38]  Nataliia Kussul,et al.  Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data , 2017, IEEE Geoscience and Remote Sensing Letters.

[39]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[40]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Grigorios G. Chrysos,et al.  Poly-NL: Linear Complexity Non-local Layers With 3rd Order Polynomials , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[44]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[45]  Dino Ienco,et al.  Land Cover Classification via Multitemporal Spatial Data by Deep Recurrent Neural Networks , 2017, IEEE Geoscience and Remote Sensing Letters.

[46]  Yannis Kalantidis,et al.  Hard Negative Mixing for Contrastive Learning , 2020, NeurIPS.

[47]  Bertrand Le Saux,et al.  Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks , 2016, ACCV.

[48]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Loïc Le Folgoc,et al.  Semi-Supervised Learning via Compact Latent Space Clustering , 2018, ICML.

[50]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  J. Six,et al.  Object-based crop identification using multiple vegetation indices, textural features and crop phenology , 2011 .

[52]  Joel H. Saltz,et al.  Label super-resolution networks , 2018, ICLR.

[53]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[54]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Christopher Conrad,et al.  Per-Field Irrigated Crop Classification in Arid Central Asia Using SPOT and ASTER Data , 2010, Remote. Sens..

[56]  Marc Rußwurm,et al.  Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders , 2018, ISPRS Int. J. Geo Inf..

[57]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[58]  Lorenzo Bruzzone,et al.  TimeSen2Crop: A Million Labeled Samples Dataset of Sentinel 2 Image Time Series for Crop-Type Classification , 2021, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[59]  Aaron C. Courville,et al.  Unsupervised Learning of Dense Visual Representations , 2020, NeurIPS.

[60]  Andriy Myronenko,et al.  3D MRI brain tumor segmentation using autoencoder regularization , 2018, BrainLes@MICCAI.

[61]  Ching-Yao Chuang,et al.  Contrastive Learning with Hard Negative Samples , 2020, ArXiv.