Semantic Segmentation With Context Encoding and Multi-Path Decoding

Semantic image segmentation aims to classify every pixel of a scene image to one of many classes. It implicitly involves object recognition, localization, and boundary delineation. In this paper, we propose a segmentation network called CGBNet to enhance the segmentation performance by context encoding and multi-path decoding. We first propose a context encoding module that generates context-contrasted local feature to make use of the informative context and the discriminative local information. This context encoding module greatly improves the segmentation performance, especially for inconspicuous objects. Furthermore, we propose a scale-selection scheme to selectively fuse the segmentation results from different-scales of features at every spatial position. It adaptively selects appropriate score maps from rich scales of features. To improve the segmentation performance results at boundary, we further propose a boundary delineation module that encourages the location-specific very-low-level features near the boundaries to take part in the final prediction and suppresses them far from the boundaries. The proposed segmentation network achieves very competitive performance in terms of all three different evaluation metrics consistently on the six popular scene segmentation datasets, Pascal Context, SUN-RGBD, Sift Flow, COCO Stuff, ADE20K, and Cityscapes.

[1]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Sinisa Segvic,et al.  Convolutional Scale Invariance for Semantic Segmentation , 2016, GCPR.

[4]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Yunchao Wei,et al.  Reversible Recursive Instance-Level Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[8]  Gang Wang,et al.  Toward Achieving Robust Low-Level and High-Level Scene Parsing , 2019, IEEE Transactions on Image Processing.

[9]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[10]  Peter V. Gehler,et al.  Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Eric P. Xing,et al.  Dynamic-Structured Semantic Propagation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[19]  Ming-Yu Liu,et al.  Recursive Context Propagation Network for Semantic Scene Labeling , 2014, NIPS.

[20]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[22]  Marcus Liwicki,et al.  Scene labeling with LSTM recurrent neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Gregory Shakhnarovich,et al.  Feedforward semantic segmentation with zoom-out features , 2014, CVPR.

[25]  Gang Wang,et al.  Boundary-Aware Feature Propagation for Scene Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Ming-Hsuan Yang,et al.  Scene Parsing with Global Context Embedding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  J. Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS 2008.

[28]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Xudong Jiang,et al.  Semantic Correlation Promoted Shape-Variant Context for Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Gang Wang,et al.  Feature Boosting Network For 3D Pose Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[34]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[35]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Greg Mori,et al.  LabelBank: Revisiting Global Perspectives for Semantic Segmentation , 2017, ArXiv.

[37]  Xudong Jiang,et al.  Dermoscopic Image Segmentation Through the Enhanced High-Level Parsing and Class Weighted Loss , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[38]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[40]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[43]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[44]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[45]  Zhuowen Tu,et al.  Convolutional Pseudo-Prior for Structured Labeling , 2015, ArXiv.

[46]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Zhuowen Tu,et al.  Top-Down Learning for Structured Labeling with Convolutional Pseudoprior , 2015, ECCV.

[48]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Gang Wang,et al.  Scene Segmentation with DAG-Recurrent Neural Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Jonathan T. Barron,et al.  A category-level 3-D object dataset: Putting the Kinect to work , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[51]  Andrew Owens,et al.  SUN3D: A Database of Big Spaces Reconstructed Using SfM and Object Labels , 2013, 2013 IEEE International Conference on Computer Vision.

[52]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[53]  Ligeng Zhu,et al.  Small Object Sensitive Segmentation of Urban Street Scene With Spatial Adjacency Between Object Classes , 2019, IEEE Transactions on Image Processing.

[54]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[55]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Piotr Bilinski,et al.  Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[61]  Philip H. S. Torr,et al.  Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[62]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[63]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[64]  S. Vehmas Staring: how we look , 2010 .

[65]  Gang Yu,et al.  Learning a Discriminative Feature Network for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Anton van den Hengel,et al.  Bridging Category-level and Instance-level Semantic Image Segmentation , 2016, ArXiv.

[68]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Yoshua Bengio,et al.  ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[70]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Gang Wang,et al.  Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[72]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[74]  Xinlei Chen,et al.  PixelNet: Towards a General Pixel-level Architecture , 2016, ArXiv.

[75]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[76]  Gang Wang,et al.  Graininess-Aware Deep Feature Learning for Pedestrian Detection , 2018, ECCV.

[77]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[78]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[79]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Xudong Jiang,et al.  DeepDeblur: text image recovery from blur to sharp , 2019, Multimedia Tools and Applications.

[81]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Dieter Fox,et al.  RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[86]  Hong Zhang,et al.  Multi-scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).