Context-Reinforced Semantic Segmentation

Recent efforts have shown the importance of context on deep convolutional neural network based semantic segmentation. Among others, the predicted segmentation map (p-map) itself which encodes rich high-level semantic cues (e.g. objects and layout) can be regarded as a promising source of context. In this paper, we propose a dedicated module, Context Net, to better explore the context information in p-maps. Without introducing any new supervisions, we formulate the context learning problem as a Markov Decision Process and optimize it using reinforcement learning during which the p-map and Context Net are treated as environment and agent, respectively. Through adequate explorations, the Context Net selects the information which has long-term benefit for segmentation inference. By incorporating the Context Net with a baseline segmentation scheme, we then propose a Context-reinforced Semantic Segmentation network (CiSS-Net), which is fully end-to-end trainable. Experimental results show that the learned context brings 3.9% absolute improvement on mIoU over the baseline segmentation method, and the CiSS-Net achieves the state-of-the-art segmentation performance on ADE20K, PASCAL-Context and Cityscapes.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[3]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  J. Mixter Fast , 2012 .

[5]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ming-Hsuan Yang,et al.  Scene Parsing with Global Context Embedding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Gang Wang,et al.  Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Bastian Leibe,et al.  Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ian D. Reid,et al.  RefineNet : MultiPath Refinement Networks with Identity Mappings for High-Resolution Semantic Segmentation , 2016 .

[13]  Yang Wang,et al.  Gated Feedback Refinement Network for Dense Image Labeling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Iasonas Kokkinos,et al.  Fast, Exact and Multi-scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs , 2016, ECCV.

[15]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[16]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[17]  Rama Chellappa,et al.  Gaussian Conditional Random Field Network for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[22]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[23]  Shuicheng Yan,et al.  Multi-Path Feedback Recurrent Neural Networks for Scene Parsing , 2016, AAAI.

[24]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[27]  Shu Kong,et al.  Recurrent Scene Parsing with Perspective Understanding in the Loop , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[30]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[32]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[33]  Sheng Tang,et al.  Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions , 2017, IJCAI.

[34]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[35]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[36]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[37]  Eric P. Xing,et al.  Dynamic-Structured Semantic Propagation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).