Pylon Model for Semantic Segmentation

Graph cut optimization is one of the standard workhorses of image segmentation since for binary random field representations of the image, it gives globally optimal results and there are efficient polynomial time implementations. Often, the random field is applied over a flat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a flat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding benefits of global optimality and efficiency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a flat partitioning) and demonstrate that the optimization in the pylon model is able to flexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches.

[1]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[2]  Charles A. Bouman,et al.  A multiscale random field model for Bayesian image segmentation , 1994, IEEE Trans. Image Process..

[3]  Narendra Ahuja,et al.  A Transform for Multiscale Image Segmentation by Integrated Edge and Region Detection , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Ronen Basri,et al.  Fast multiscale image segmentation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[6]  Olga Veksler,et al.  Image segmentation by nested cuts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Christopher K. I. Williams,et al.  Combining Belief Networks and Neural Networks for Scene Segmentation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Endre Boros,et al.  Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[10]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[11]  R. Zabih,et al.  What energy functions can be minimized via graph cuts , 2004 .

[12]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[14]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[16]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.

[17]  Kevin P. Murphy,et al.  Figure-ground segmentation using a hierarchical conditional random field , 2007, Fourth Canadian Conference on Computer and Robot Vision (CRV '07).

[18]  Cordelia Schmid,et al.  Accurate Object Localization with Shape Masks , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Alexei A. Efros,et al.  Improving Spatial Support for Objects via Multiple Segmentations , 2007, BMVC.

[20]  Balaraman Ravindran,et al.  Image Modeling Using Tree Structured Conditional Random Fields , 2007, IJCAI.

[21]  Vladimir Kolmogorov,et al.  Applications of parametric maxflow in computer vision , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Narendra Ahuja,et al.  Learning subcategory relevances for category recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[25]  Bernt Schiele,et al.  Hierarchical Support Vector Random Fields: Joint Training to Combine Local and Global Features , 2008, ECCV.

[26]  Pablo Arbeláez,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[29]  Marc Toussaint,et al.  Multi-class image segmentation using conditional random fields and global classification , 2009, ICML '09.

[30]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[32]  Daphne Koller,et al.  Efficiently selecting regions for scene understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Martial Hebert,et al.  Stacked Hierarchical Labeling , 2010, ECCV.

[35]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[36]  René Vidal,et al.  Using global bag of features models in random fields for joint categorization and segmentation of objects , 2011, CVPR 2011.

[37]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Larry S. Davis,et al.  Piecing together the segmentation jigsaw using context , 2011, CVPR 2011.

[39]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..