Gated Feedback Refinement Network for Dense Image Labeling

Effective integration of local and global contextual information is crucial for dense labeling problems. Most existing methods based on an encoder-decoder architecture simply concatenate features from earlier layers to obtain higher-frequency details in the refinement stages. However, there are limits to the quality of refinement possible if ambiguous information is passed forward. In this paper we propose Gated Feedback Refinement Network (G-FRNet), an end-to-end deep learning framework for dense labeling tasks that addresses this limitation of existing methods. Initially, G-FRNet makes a coarse prediction and then it progressively refines the details by efficiently integrating local and global contextual information during the refinement stages. We introduce gate units that control the information passed forward in order to filter out ambiguity. Experiments on three challenging dense labeling datasets (CamVid, PASCAL VOC 2012, and Horse-Cow Parsing) show the effectiveness of our method. Our proposed approach achieves state-of-the-art results on the CamVid and Horse-Cow Parsing datasets, and produces competitive results on the PASCAL VOC 2012 dataset.

[1]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[2]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[3]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[4]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[5]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[6]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[7]  Shuicheng Yan,et al.  Semantic Object Parsing with Local-Global Long Short-Term Memory , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  S. Tsogkas,et al.  Deep Learning for Semantic Part Segmentation with High-Level Guidance , 2015 .

[9]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[10]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[11]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[12]  S. Yantis,et al.  Selective visual attention and perceptual coherence , 2006, Trends in Cognitive Sciences.

[13]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[14]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16]  Alan L. Yuille,et al.  Semantic part segmentation using compositional model combining shape and appearance , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jing Liu,et al.  Objectness-aware Semantic Segmentation , 2016, ACM Multimedia.

[19]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Yang Wang,et al.  Label Refinement Network for Coarse-to-Fine Semantic Segmentation , 2017, ArXiv.

[22]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[23]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[24]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[25]  Yoshua Bengio,et al.  ReSeg: A Recurrent Neural Network-Based Model for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[28]  Helmut Schwegler,et al.  Coarse coding: calculation of the resolution achieved by a population of large receptive field neurons , 1997, Biological Cybernetics.

[29]  Alan L. Yuille,et al.  Joint Object and Part Segmentation Using Deep Learned Potentials , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  N. Kanwisher,et al.  PSYCHOLOGICAL SCIENCE Research Article Visual Recognition As Soon as You Know It Is There, You Know What It Is , 2022 .

[31]  Xiaoxiao Li,et al.  Deep Learning Markov Random Field for Semantic Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Vladlen Koltun,et al.  Feature Space Optimization for Semantic Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.