Contrastive Transformer: Contrastive Learning Scheme with Transformer innate Patches

This paper presents Contrastive Transformer, a contrastive learning scheme using the Transformer innate patches. Contrastive Transformer enables existing contrastive learning techniques, often used for image classification, to benefit dense downstream prediction tasks such as semantic segmentation. The scheme performs supervised patch-level contrastive learning, selecting the patches based on the ground truth mask, subsequently used for hard-negative and hard-positive sampling. The scheme applies to all vision-transformer architectures, is easy to implement, and introduces minimal additional memory footprint. Additionally, the scheme removes the need for huge batch sizes, as each patch is treated as an image. We apply and test Contrastive Transformer for the case of aerial image segmentation, known for low-resolution data, large class imbalance, and similar semantic classes. We perform extensive experiments to show the efficacy of the Contrastive Transformer scheme on the ISPRS Potsdam aerial image segmentation dataset. Additionally, we show the generalizability of our scheme by applying it to multiple inherently different Transformer architectures. Ultimately, the results show a consistent increase in mean IoU across all classes.

[1]  Pruthuvi Maheshakya Wijewardena,et al.  On The Computational Complexity of Self-Attention , 2022, ALT.

[2]  Yali Wang,et al.  UniFormer: Unifying Convolution and Self-Attention for Visual Recognition , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Shuicheng Yan,et al.  MetaFormer is Actually What You Need for Vision , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  P. Atkinson,et al.  UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery , 2021, ISPRS Journal of Photogrammetry and Remote Sensing.

[5]  Jianfei Cai,et al.  Multi-Label Image Classification with Contrastive Learning , 2021, ArXiv.

[6]  L. Montesano,et al.  Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Ce Zhang,et al.  A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images , 2021, IEEE Geoscience and Remote Sensing Letters.

[8]  Edward Johns,et al.  Bootstrapping Semantic Segmentation with Regional Contrast , 2021, ICLR.

[9]  L. Gool,et al.  Exploring Cross-Image Pixel Contrast for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Ying Wu,et al.  Contrastive Learning for Label Efficient Semantic Segmentation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[12]  Qinghui Liu,et al.  Dense Dilated Convolutions’ Merging Network for Land Cover Classification , 2020, IEEE Transactions on Geoscience and Remote Sensing.

[13]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[14]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[16]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[17]  Lingfeng Wang,et al.  Semantic Labeling in Very High Resolution Images via a Self-Cascaded Convolutional Neural Network , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[18]  Bertrand Le Saux,et al.  Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[19]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20]  Stephan R. Richter,et al.  Looking Beyond Single Images for Contrastive Semantic Segmentation Learning , 2021, NeurIPS.

[21]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.