Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation

Semi-supervised semantic segmentation (SSS) has recently gained increasing research interest as it can reduce the requirement for large-scale fully-annotated training data. The current methods often suffer from the confirmation bias from the pseudo-labelling process, which can be alleviated by the co-training framework. The current co-training-based SSS methods rely on hand-crafted perturbations to prevent the different sub-nets from collapsing into each other, but these artificial perturbations cannot lead to the optimal solution. In this work, we propose a new conflict-based cross-view consistency (CCVC) method based on a two-branch co-training framework which aims at enforcing the two sub-nets to learn informative features from irrelevant views. In particular, we first propose a new cross-view consistency (CVC) strategy that encourages the two sub-nets to learn distinct features from the same input by introducing a feature discrepancy loss, while these distinct features are expected to generate consistent prediction scores of the input. The CVC strategy helps to prevent the two sub-nets from stepping into the collapse. In addition, we further propose a conflict-based pseudo-labelling (CPL) method to guarantee the model will learn more useful information from conflicting predictions, which will lead to a stable training process. We validate our new CCVC approach on the SSS benchmark datasets where our method achieves new state-of-the-art performance. Our code is available at https://github.com/xiaoyao3302/CCVC.

[1]  Jingdong Wang,et al.  Augmentation Matters: A Simple-Yet-Effective Approach to Semi-Supervised Semantic Segmentation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jingdong Wang,et al.  Instance-Specific and Model-Adaptive Supervision for Semi-Supervised Semantic Segmentation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Chunhua Shen,et al.  SegViT: Semantic Segmentation with Plain Vision Transformers , 2022, NeurIPS.

[4]  Qibin Hou,et al.  SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation , 2022, NeurIPS.

[5]  Wayne Zhang,et al.  Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Dong Xu,et al.  APSNet: Toward Adaptive Point Sampling for Efficient 3D Action Recognition , 2022, IEEE Transactions on Image Processing.

[7]  Lei Wang,et al.  LaSSL: Label-Guided Self-Training for Semi-supervised Learning , 2022, AAAI.

[8]  Shan Liu,et al.  LSVC: A Learning-based Stereo Video Compression Framework , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Lei Wang,et al.  DC-SSL: Addressing Mismatched Class Distribution in Semi-supervised Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jing Zhang,et al.  3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Suha Kwak,et al.  Semi-supervised Semantic Segmentation with Error Localization Network , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jifeng Dai,et al.  BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers , 2022, ECCV.

[13]  Shijian Lu,et al.  Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Xinyi Le,et al.  Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Dong Xu,et al.  Exploiting Intra-Slice and Inter-Slice Redundancy for Learning-Based Lossless Volumetric Image Compression , 2022, IEEE Transactions on Image Processing.

[16]  G. Carneiro,et al.  Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation , 2021, Computer Vision and Pattern Recognition.

[17]  Jungbeom Lee,et al.  Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation , 2021, NeurIPS.

[18]  Wei Zhang,et al.  C3-SemiSeg: Contrastive Semi-supervised Segmentation via Cross-set Learning and Dynamic Class-balancing , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Bodi Yuan,et al.  Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Yang Gao,et al.  ST++: Make Self-trainingWork Better for Semi-supervised Semantic Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yuhui Yuan,et al.  Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jiaya Jia,et al.  Semi-supervised Semantic Segmentation with Directional Context-aware Consistency , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[24]  L. Montesano,et al.  Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive Learning from a Class-wise Memory Bank , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  A. Davison,et al.  Bootstrapping Semantic Segmentation with Regional Contrast , 2021, ICLR.

[26]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[29]  Han Zhang,et al.  PseudoSeg: Designing Pseudo Labels for Semantic Segmentation , 2020, ICLR.

[30]  Di Qiu,et al.  Guided Collaborative Training for Pixel-wise Semi-Supervised Learning , 2020, ECCV.

[31]  Kilian Q. Weinberger,et al.  Deep Co-Training with Task Decomposition for Semi-Supervised Domain Adaptation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Lennart Svensson,et al.  ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[34]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[35]  C. Hudelot,et al.  Semi-Supervised Semantic Segmentation With Cross-Consistency Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  David Berthelot,et al.  FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[37]  Timo Aila,et al.  Semi-supervised semantic segmentation needs strong, high-dimensional perturbations , 2019 .

[38]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Kuiyuan Yang,et al.  GFF: Gated Fully Fusion for Semantic Segmentation , 2019, AAAI.

[40]  Mostafa S. Ibrahim,et al.  Semi-Supervised Semantic Image Segmentation With Self-Correcting Networks , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Dong Xu,et al.  Collaborative and Adversarial Network for Unsupervised Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Suha Kwak,et al.  Learning Pixel-Level Semantic Affinity with Image-Level Supervision for Weakly Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Bo Wang,et al.  Deep Co-Training for Semi-Supervised Image Recognition , 2018, ECCV.

[45]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[46]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[47]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[48]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Christian Szegedy,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[55]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[56]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[58]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Xilin Chen,et al.  HRFormer: High-Resolution Vision Transformer for Dense Predict , 2021, NeurIPS.

[60]  João Paulo Papa,et al.  Semi-supervised Segmentation Based on Error-Correcting Supervision , 2020, ECCV.