GFF: Gated Fully Fusion for Semantic Segmentation

Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel. High-level features from Deep Convolutional Neural Networks already demonstrate their effectiveness in semantic segmentation tasks, however the coarse resolution of high-level features often leads to inferior results for small/thin objects where detailed information is important. It is natural to consider importing low level features to compensate for the lost detailed information in high-level features.Unfortunately, simply combining multi-level features suffers from the semantic gap among them. In this paper, we propose a new architecture, named Gated Fully Fusion (GFF), to selectively fuse features from multiple levels using gates in a fully connected way. Specifically, features at each level are enhanced by higher-level features with stronger semantics and lower-level features with more details, and gates are used to control the propagation of useful information which significantly reduces the noises during fusion. We achieve the state of the art results on four challenging scene parsing datasets including Cityscapes, Pascal Context, COCO-stuff and ADE20K.

[1]  Shu Kong,et al.  Recurrent Scene Parsing with Perspective Understanding in the Loop , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[6]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[7]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Nicu Sebe,et al.  PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Hong Liu,et al.  Expectation-Maximization Attention Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Gang Yu,et al.  Learning a Discriminative Feature Network for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Gang Yu,et al.  BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation , 2018, ECCV.

[14]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Dani Lischinski,et al.  Multi-scale Context Intertwining for Semantic Segmentation , 2018, ECCV.

[16]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Eric P. Xing,et al.  Dynamic-Structured Semantic Propagation Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Gang Wang,et al.  Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[23]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[25]  Piotr Bilinski,et al.  Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Michael Kampffmeyer,et al.  Semantic Segmentation of Small Objects and Modeling of Uncertainty in Urban Remote Sensing Images Using Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Abhinav Gupta,et al.  Beyond Grids: Learning Graph Representations for Visual Recognition , 2018, NeurIPS.

[29]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[30]  Sanja Fidler,et al.  Gated-SCNN: Gated Shape CNNs for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Sheng Tang,et al.  Scale-Adaptive Convolutions for Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Jian Sun,et al.  ExFuse: Enhancing Feature Fusion for Semantic Segmentation , 2018, ECCV.

[34]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[37]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Xudong Jiang,et al.  Semantic Correlation Promoted Shape-Variant Context for Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[42]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[44]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[45]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Xiangyu Zhang,et al.  Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Xiang Bai,et al.  Asymmetric Non-Local Neural Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  George Papandreou,et al.  Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[50]  Anton van den Hengel,et al.  Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[51]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[52]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Shuicheng Yan,et al.  Semantic Segmentation via Structured Patch Prediction, Context CRF and Guidance CRF , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yi Zhang,et al.  PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[57]  Jingdong Wang,et al.  OCNet: Object Context Network for Scene Parsing , 2018, ArXiv.

[58]  Stella X. Yu,et al.  Adaptive Affinity Fields for Semantic Segmentation , 2018, ECCV.