Multi-Class Image Labeling with Top-Down Segmentation and Generalized Robust $P^N$ Potentials

We propose a novel formulation for the scene labeling problem which is able to combine object detections with pixel-level information in a Conditional Random Field (CRF) framework. Since object detection and multi-class image labeling are mutually informative problems, pixel-wise segmentation can benefit from powerful object detectors and vice versa. The main contribution of the current work lies in the incorporation of topdown object segmentations as generalized robust P N potentials into the CRF formulation. These potentials present a principled manner to convey soft object segmentations into a unified energy minimization framework, enabling joint optimization and thus mutual benefit for both problems. As our results show, the proposed approach outperforms the state-of-the-art methods on the categories for which object detections are available. Quantitative and qualitative experiments show the effectiveness of the proposed method.

[1]  Andrew Zisserman,et al.  OBJ CUT , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2011, International Journal of Computer Vision.

[3]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[4]  Bernt Schiele,et al.  A Dynamic Conditional Random Field Model for Joint Labeling of Object and Scene Classes , 2008, ECCV.

[5]  C. V. Jawahar,et al.  Scene Text Recognition using Higher Order Language Priors , 2009, BMVC.

[6]  Roberto Cipolla,et al.  Segmentation and Recognition Using Structure from Motion Point Clouds , 2008, ECCV.

[7]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[9]  Nir Friedman,et al.  Probabilistic Graphical Models , 2009, Data-Driven Computational Neuroscience.

[10]  Pushmeet Kohli,et al.  Associative hierarchical CRFs for object class image segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[12]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[13]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Anat Levin,et al.  Learning to Combine Bottom-Up and Top-Down Segmentation , 2006, ECCV.

[16]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[19]  Luc Van Gool,et al.  Segmentation-Based Urban Traffic Scene Understanding , 2009, BMVC.

[20]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[21]  Philip H. S. Torr,et al.  What, Where and How Many? Combining Object Detectors and CRFs , 2010, ECCV.

[22]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Philip H. S. Torr,et al.  Combining Appearance and Structure from Motion Features for Road Scene Understanding , 2009, BMVC.

[24]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[26]  Luc Van Gool,et al.  Shape-from-recognition: Recognition enables meta-data transfer , 2009, Computer Vision and Image Understanding.

[27]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[28]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[29]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[30]  F. Jurie,et al.  Category Level Object Segmentation by Combining Bag-of-words Models and Markov Random Fields , 2008 .