Probabilistic Joint Image Segmentation and Labeling by Figure-Ground Composition

We propose a layered statistical model for image segmentation and labeling obtained by combining independently extracted, possibly overlapping sets of figure-ground (FG) segmentations. The process of constructing consistent image segmentations, called tilings, is cast as optimization over sets of maximal cliques sampled from a graph connecting all non-overlapping figure-ground segment hypotheses. Potential functions over cliques combine unary, Gestalt-based figure qualities, and pairwise compatibilities among spatially neighboring segments, constrained by T-junctions and the boundary interface statistics of real scenes. Building on the segmentation layer, we further derive a joint image segmentation and labeling model (JSL) which, given a bag of FGs, constructs a joint probability distribution over both the compatible image interpretations (tilings) composed from those segments, and over their labeling into categories. The process of drawing samples from the joint distribution can be interpreted as first sampling tilings, followed by sampling labelings conditioned on the choice of a particular tiling. We learn the segmentation and labeling parameters jointly, based on maximum likelihood with a novel estimation procedure we refer to as incremental saddle-point approximation. The partition function over tilings and labelings is increasingly more accurately approximated by including incorrect configurations that are rated as probable by candidate models during learning. State of the art results are reported on the Berkeley, Stanford and Pascal VOC datasets, where an improvement of 28 % was achieved for the segmentation task only (tiling), and an accuracy of 47.8 % was obtained on the test set of VOC12 for semantic labeling (JSL).

[1]  Cordelia Schmid,et al.  Object Recognition by Integrating Multiple Image Segmentations , 2008, ECCV.

[2]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[3]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[4]  Ronen Basri,et al.  Hierarchy and adaptivity in segmenting visual scenes , 2006, Nature.

[5]  Jitendra Malik,et al.  Semantic segmentation using regions and parts , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  KohliPushmeet,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2009 .

[8]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[9]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[10]  Martial Hebert,et al.  Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study , 2005, EMMCVPR.

[11]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[12]  Sebastian Nowozin,et al.  Pottics - The Potts Topic Model for Semantic Image Segmentation , 2012, DAGM/OAGM Symposium.

[13]  William Brendel,et al.  Segmentation as Maximum-Weight Independent Set , 2010, NIPS.

[14]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[16]  Loong Fah Cheong,et al.  Segmentation over Detection by Coupled Global and Local Sparse Representations , 2012, ECCV.

[17]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[18]  Joost van de Weijer,et al.  Harmony potentials for joint classification and segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[20]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, CVPR.

[21]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[22]  Steven W. Zucker,et al.  Finding folds: on the appearance and identification of occlusion , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Cristian Sminchisescu,et al.  Image segmentation by figure-ground composition into maximal cliques , 2011, 2011 International Conference on Computer Vision.

[26]  Cristian Sminchisescu,et al.  Object Recognition by Sequential Figure-Ground Ranking , 2011, International Journal of Computer Vision.

[27]  Jitendra Malik,et al.  Context by region ancestry , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Rolf Lakämper,et al.  Using the Particle Filter Approach to Building Partial Correspondences Between Shapes , 2009, International Journal of Computer Vision.

[29]  Gabriela Csurka,et al.  An Efficient Approach to Semantic Segmentation , 2011, International Journal of Computer Vision.

[30]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Cristian Sminchisescu,et al.  Probabilistic Joint Image Segmentation and Labeling , 2011, NIPS.

[32]  Sebastian Nowozin,et al.  On Parameter Learning in CRF-Based Approaches to Object Class Image Segmentation , 2010, ECCV.

[33]  Adrian Barbu,et al.  Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  S. Palmer,et al.  Surface convexity and extremal edges in depth and figure-ground perception , 2005 .

[35]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[36]  Michal Irani,et al.  What Is a Good Image Segment? A Unified Approach to Segment Extraction , 2008, ECCV.

[37]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[38]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[39]  Jitendra Malik,et al.  Figure/Ground Assignment in Natural Images , 2006, ECCV.

[40]  Boris Goldengorin,et al.  Handbook of combinatorial optimization , 2013 .

[41]  Joost van de Weijer,et al.  Harmony Potentials , 2011, International Journal of Computer Vision.

[42]  Alexei A. Efros,et al.  Improving Spatial Support for Objects via Multiple Segmentations , 2007, BMVC.

[43]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[44]  Cristian Sminchisescu,et al.  Random Fourier Approximations for Skewed Multiplicative Histogram Kernels , 2010, DAGM-Symposium.

[45]  Stephen Gould,et al.  Region-based Segmentation and Object Detection , 2009, NIPS.

[46]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  GouldStephen,et al.  Multi-Class Segmentation with Relative Location Prior , 2008 .

[48]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[49]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[50]  Jianbo Shi,et al.  Learning spectral graph segmentation , 2005, AISTATS.

[51]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[52]  P. Pardalos,et al.  Handbook of Combinatorial Optimization , 1998 .

[53]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Michael Lindenbaum,et al.  Boundary ownership by lifting to 2.1D , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[56]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[57]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[58]  Cristian Sminchisescu,et al.  CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Daphne Koller,et al.  Efficiently selecting regions for scene understanding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  Jitendra Malik,et al.  Learning a classification model for segmentation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[61]  Yi Yang,et al.  Layered Object Models for Image Segmentation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Philip H. S. Torr,et al.  What , Where & How Many ? Combining Object Detectors and CRFs , 2010 .

[63]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[64]  Volker Stix,et al.  Approximating the maximum weight clique using replicator dynamics , 2000, IEEE Trans. Neural Networks Learn. Syst..