Mining Better Samples and Semantic Consistency for Contrast Learning in Forest Semantic Segmentation

Image segmentation has impressive progress in the past several years. But good segmentation usually follows pixel-wise well-annotated labels which is energy-consuming. Moreover, the robustness would not be guaranteed due to lack-of-diversity datasets. Previous work usually focuses on pixels individually and pay less attention to the neighbor pixels. The local context would be scarce and the global context is not utilized following these ways. We proposal a method, named Forest Semantic Segmentation Network(FSSNet) to address these issues. FSSNet organizes original version and augmented version of images, as two inputs into student branch and teacher branch, and force the two outputs being consistent to strengthen the robustness of our model. Moreover, we not only consider pixel itself and also the neighbor pixels because the context of neighbor pixels helps understanding the pixel. FSSNet utilizes contrastive loss with memory bank to involve global context in training which will make pixels closer to others in same category and far away from pixels of different categories. A bank filter is suggested to improve the quality of features in the memory bank. we also proposal a new sample strategy to improve the effect of contrastive loss and reduce the computation. Our method can improve accuracy and strengthen the robustness with affordable extra computation during training process, and no additional computation during inference toward baseline. Comparing to benchmark, the proposed approach can improve the mIoU by 3.1% on our challenging dataset.

[1]  Changxin Gao,et al.  Lite-HRNet: A Lightweight High-Resolution Network , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  L. Gool,et al.  Exploring Cross-Image Pixel Contrast for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Lennart Svensson,et al.  ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[6]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[9]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.

[10]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[11]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[17]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[20]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[21]  Christopher K. I. Williams,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) The PASCAL Visual Object Classes (VOC) Challenge , 2022 .