Higher Order Potentials in End-to-End Trainable Conditional Random Fields

We tackle the problem of semantic segmentation using deep learning techniques. Most semantic segmentation systems include a Conditional Random Field (CRF) model to produce a structured output that is consistent with visual features of the image. With recent advances in deep learning, it is becoming increasingly common to perform CRF inference within a deep neural network to facilitate joint learning of the CRF with a pixel-wise Convolutional Neural Network (CNN) classifier. While basic CRFs use only unary and pairwise potentials, it has been shown that the addition of higher order potentials defined on cliques with more than two nodes can result in a better segmentation outcome. In this paper, we show that two types of higher order potential, namely, object detection based potentials and superpixel based potentials, can be included in a CRF embedded within a deep network. We design these higher order potentials to allow inference with the efficient and differentiable mean-field algorithm, making it possible to implement our CRF model as a stack of layers in a deep network. As a result, all parameters of our richer CRF model can be jointly learned with a CNN classifier during the end-to-end training of the entire network. We find significant improvement in the results with the introduction of these trainable higher order potentials.

[1]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[3]  Yi Yang,et al.  Layered Object Models for Image Segmentation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Sanja Fidler,et al.  Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Joost van de Weijer,et al.  Harmony potentials for joint classification and segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[7]  Vladlen Koltun,et al.  Parameter Learning and Convergent Inference for Dense Random Fields , 2013, ICML.

[8]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[9]  Pushmeet Kohli,et al.  P3 & Beyond: Solving Energies with Higher Order Cliques , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[11]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Silvio Savarese,et al.  Relating Things and Stuff via ObjectProperty Interactions , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Stephen Gould,et al.  Region-based Segmentation and Object Detection , 2009, NIPS.

[15]  Ian D. Reid,et al.  Deeply Learning the Messages in Message Passing Inference , 2015, NIPS.

[16]  Justin Domke,et al.  Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[20]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[21]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[22]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[23]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[24]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[25]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sebastian Ramos,et al.  The Cityscapes Dataset , 2015, CVPR 2015.

[28]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[30]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[32]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[33]  Vibhav Vineet,et al.  Filter-Based Mean-Field Inference for Random Fields with Higher-Order Terms and Product Label-Spaces , 2012, International Journal of Computer Vision.

[34]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[35]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[37]  Xiaoxiao Li,et al.  Semantic Image Segmentation via Deep Parsing Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Lei Chen,et al.  Deep Structured Models For Group Activity Recognition , 2015, BMVC.

[40]  Tsuhan Chen,et al.  Learning class-specific affinities for image labelling , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Peter V. Gehler,et al.  Human Pose Estimation with Fields of Parts , 2014, ECCV.

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[44]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[45]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).