Inference Scene Labeling by Incorporating Object Detection with Explicit Shape Model

In this paper, we incorporate shape detection into contextual scene labeling and make use of both shape, texture, and context information in a graphical representation. We propose a candidacy graph, whose vertices are two types of recognition candidates for either a superpixel or a window patch. The superpixel candidates are generated by a discriminative classifier with textural features as well as the window proposals by a learned deformable templates model in the bottom-up steps. The contextual and competitive interactions between graph vertices, in form of probabilistic connecting edges, are defined by two types of contextual metrics and the overlapping of their image domain, respectively. With this representation, a composite clustering sampling algorithm is proposed to fast search the optimal convergence globally using the Markov Chain Monte Carlo (MCMC). Our approach is applied on both lotus hill institute (LHI) and MSRC public datasets and achieves the state-of-art results.

[1]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.

[2]  Krzysztof R. Apt,et al.  The Essence of Constraint Propagation , 1998, Theor. Comput. Sci..

[3]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Guillermo Sapiro,et al.  Geodesic Matting: A Framework for Fast Interactive Image and Video Segmentation and Matting , 2009, International Journal of Computer Vision.

[5]  Brendan J. Frey,et al.  A Revolution: Belief Propagation in Graphs with Cycles , 1997, NIPS.

[6]  Cordelia Schmid,et al.  Accurate Object Detection with Deformable Shape Models Learnt from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Lin Yang,et al.  Multiple Class Segmentation Using A Unified Framework over Mean-Shift Patches , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[9]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Shimon Ullman,et al.  Combined Top-Down/Bottom-Up Segmentation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[13]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[14]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[15]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Zhuowen Tu,et al.  Active skeleton for non-rigid object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[18]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Song-Chun Zhu,et al.  Generalizing Swendsen–Wang for Image Analysis , 2007, Journal of Computational and Graphical Statistics.

[22]  Patrick Pérez,et al.  Hierarchical statistical models for the fusion of multiresolution image data , 1995, Proceedings of IEEE International Conference on Computer Vision.

[23]  Benjamin Z. Yao,et al.  Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.

[24]  Song-Chun Zhu,et al.  Deformable Template As Active Basis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[26]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.