Combining the Best of Graphical Models and ConvNets for Semantic Segmentation

We present a two-module approach to semantic segmentation that incorporates Convolutional Networks (CNNs) and Graphical Models. Graphical models are used to generate a small (5-30) set of diverse segmentations proposals, such that this set has high recall. Since the number of required proposals is so low, we can extract fairly complex features to rank them. Our complex feature of choice is a novel CNN called SegNet, which directly outputs a (coarse) semantic segmentation. Importantly, SegNet is specifically trained to optimize the corpus-level PASCAL IOU loss function. To the best of our knowledge, this is the first CNN specifically designed for semantic segmentation. This two-module approach achieves $52.5\%$ on the PASCAL 2012 segmentation challenge.

[1]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[2]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[3]  Object Tracking and Segmentation with a Population of Artificial Neural Networks , 2007 .

[4]  L. Bottou,et al.  Deep Convolutional Networks for Scene Parsing , 2009 .

[5]  Yann LeCun,et al.  A multirange architecture for collision‐free off‐road robot navigation , 2009, J. Field Robotics.

[6]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[8]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[9]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[10]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[11]  Sven Behnke,et al.  Learning Object-Class Segmentation with Convolutional Neural Networks , 2012, ESANN.

[12]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[15]  Gregory Shakhnarovich,et al.  Discriminative Re-ranking of Diverse Segmentations , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Neural Networks , 2013 .

[17]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[18]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[21]  Joost van de Weijer,et al.  Unrolling Loopy Top-Down Semantic Feedback in Convolutional Deep Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22]  R. Fergus,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[23]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[24]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[27]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.