Fast Semantic Image Segmentation with High Order Context and Guided Filtering

This paper describes a fast and accurate semantic image segmentation approach that encodes not only the discriminative features from deep neural networks, but also the high-order context compatibility among adjacent objects as well as low level image features. We formulate the underlying problem as the conditional random field that embeds local feature extraction, clique potential construction, and guided filtering within the same framework, and provide an efficient coarse-to-fine solver. At the coarse level, we combine local feature representation and context interaction using a deep convolutional network, and directly learn the interaction from high order cliques with a message passing routine, avoiding time-consuming explicit graph inference for joint probability distribution. At the fine level, we introduce a guided filtering interpretation for the mean field algorithm, and achieve accurate object boundaries with 100+ faster than classic learning methods. The two parts are connected and jointly trained in an end-to-end fashion. Experimental results on Pascal VOC 2012 dataset have shown that the proposed algorithm outperforms the state-of-the-art, and that it achieves the rank 1 performance at the time of submission, both of which prove the effectiveness of this unified framework for semantic image segmentation.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  Nir Friedman,et al.  Probabilistic Graphical Models , 2009, Data-Driven Computational Neuroscience.

[3]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[4]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[6]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[8]  Vibhav Vineet,et al.  Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2012, International Journal of Computer Vision.

[9]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jonathan T. Barron,et al.  Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Andrew Adams,et al.  Fast High‐Dimensional Filtering Using the Permutohedral Lattice , 2010, Comput. Graph. Forum.

[16]  Philip H. S. Torr,et al.  Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[17]  Ian D. Reid,et al.  Deeply Learning the Messages in Message Passing Inference , 2015, NIPS.

[18]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[19]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[20]  Jian Sun,et al.  Fast Guided Filter , 2015, ArXiv.

[21]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[22]  Narendra Ahuja,et al.  Real-time O(1) bilateral filtering , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Jian Sun,et al.  Guided Image Filtering , 2010, ECCV.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Linda C. van der Gaag,et al.  Probabilistic Graphical Models , 2014, Lecture Notes in Computer Science.

[26]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[27]  Shengen Yan,et al.  Deep Image: Scaling up Image Recognition , 2015, ArXiv.

[28]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Carsten Rother,et al.  Efficient Likelihood Learning of a Generic CNN-CRF Model for Semantic Segmentation , 2015, ArXiv.

[30]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[32]  Philip H. S. Torr,et al.  Higher Order Potentials in End-to-End Trainable Conditional Random Fields , 2015, ArXiv.

[33]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.