Non-Gibbsian Markov Random Field Models for Contextual Labelling of Structured Scenes

In this paper we propose a non-Gibbsian Markov random field to model the spatial and topological relationships between objects in s tructured scenes. The field is formulated in terms of conditional probabilities le arned from a set of training images. A locally consistent labelling of new scenes is achieved by relaxing the Markov random field directly using these condit ional probabilities. We evaluate our model on a varied collection of several hundred handsegmented images of buildings.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Jun Zhang,et al.  A Markov Random Field Model-Based Approach to Image Interpretation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[4]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Peter Dayan,et al.  Computational Differences between Asymmetrical and Symmetrical Networks , 1998, NIPS.

[7]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[8]  Stan Z. Li,et al.  Markov Random Field Modeling in Computer Vision , 1995, Computer Science Workbench.

[9]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[10]  Martial Hebert,et al.  Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Azriel Rosenfeld,et al.  Scene Labeling by Relaxation Operations , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Richard S. Zemel,et al.  Learning and Incorporating Top-Down Cues in Image Segmentation , 2006, ECCV.

[13]  M. Bar,et al.  Cortical Analysis of Visual Context , 2003, Neuron.

[14]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[15]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[18]  E. Halgren,et al.  Top-down facilitation of visual recognition. , 2006, Proceedings of the National Academy of Sciences of the United States of America.