Robust Semi-Supervised Semantic Segmentation Based on Self-Attention and Spectral Normalization

The application of adversarial learning for semi-supervised semantic image segmentation based on convolutional neural networks can effectively reduce the number of manually generated labels required in the training process. However, the convolution operator of the generator in the generative adversarial network (GAN) has a local receptive field, so that the long-range dependencies between different image regions can only be modeled after passing through multiple convolutional layers. The present work addresses this issue by introducing a self-attention mechanism in the generator of the GAN to effectively account for relationships between widely separated spatial regions of the input image with supervision based on pixel-level ground truth data. In addition, the adjustment of the discriminator has been demonstrated to affect the stability of GAN training performance. This is addressed by applying spectral normalization to the GAN discriminator during the training process. The proposed stable self-attention adversarial learning semi-supervised semantic image segmentation network is demonstrated to provide superior image segmentation performance compared with the results of current semi-supervised and fully-supervised semantic image segmentation techniques.

[1]  Yi Zhang,et al.  PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[2]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[5]  Ming-Hsuan Yang,et al.  Adversarial Learning for Semi-supervised Semantic Segmentation , 2018, BMVC.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[8]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[9]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[10]  Vincent Lepetit,et al.  S4-Net: Geometry-Consistent Semi-Supervised Semantic Segmentation , 2018, ArXiv.

[11]  Timo Aila,et al.  Consistency regularization and CutMix for semi-supervised semantic segmentation , 2019, ArXiv.

[12]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[15]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[18]  Gang Wang,et al.  Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Thomas Brox,et al.  Semi-Supervised Semantic Segmentation With High- and Low-Level Consistency , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[23]  Ming-Hsuan Yang,et al.  Deep Image Harmonization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[25]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Colin Raffel,et al.  Realistic Evaluation of Semi-Supervised Learning Algorithms , 2018, ICLR.

[29]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[30]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[31]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[32]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[33]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).