FC-RCCN: Fully convolutional residual continuous CRF network for semantic segmentation

Abstract Enlarging the spatial resolution of features generated by fully convolutional networks (FCNs) can improve the performance of semantic segmentation. To achieve this goal, deeper network with deconvolutional structure can be applied. However, when the network architecture becomes more complex, the training efficiency may degrade. To address the joint optimization problem of improving spatial resolution through deeper networks and training deeper networks more effectively, we propose a Fully Convolutional Residual Continuous CRF Network (FC-RCCN) for semantic segmentation. FC-RCCN is composed of three subnetworks: a unary network, a pairwise network, and a superpixel based continuous conditional random filed (C-CRF) network. In order to generate full spatial resolution predictions with high-quality, a residual block based unary network with multi-scale features fusion is proposed. Even though the unary network is a deeper network, the whole framework can be trained effectively in an end-to-end way using the joint pixel-level and superpixel-level supervised learning strategy which is optimized by a pixel-level softmax cross entropy loss and a superpixel-level log-likelihood loss. Besides, C-CRF inference is fused with pixel-level prediction during the test procedure, which guarantees the method’s robustness to the superpxiel errors. In the experiments, we evaluatee the power of the three subnetworks and the learning strategy comprehensively. Experiments on three benchmark datasets demonstrate that the proposed FC-RCCN outperforms previous segmentation methods and obtains the state-of-the-art performance.

[1]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jie Yang,et al.  Saliency Detection by Fully Learning a Continuous Conditional Random Field , 2017, IEEE Transactions on Multimedia.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Nicu Sebe,et al.  Joint Attributes and Event Analysis for Multimedia Event Detection , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[6]  Lei Zhu,et al.  Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval , 2016, Multimedia Tools and Applications.

[7]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[8]  Lina Yao,et al.  Learning Multiple Diagnosis Codes for ICU Patients with Local Disease Correlation Mining , 2017, ACM Trans. Knowl. Discov. Data.

[9]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jian Sun,et al.  Face Alignment by Explicit Shape Regression , 2012, International Journal of Computer Vision.

[11]  Hai Jin,et al.  Effective naive Bayes nearest neighbor based image classification on GPU , 2013, The Journal of Supercomputing.

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13]  Lina Yao,et al.  Diagnosis Code Assignment Using Sparsity-Based Disease Correlation Embedding , 2016, IEEE Transactions on Knowledge and Data Engineering.

[14]  William H. Press,et al.  Numerical recipes in C , 2002 .