Holistic context modeling using semantic co-occurrences

We present a simple framework to model contextual relationships between visual concepts. The new framework combines ideas from previous object-centric methods (which model contextual relationships between objects in an image, such as their co-occurrence patterns) and scene-centric methods (which learn a holistic context model from the entire image, known as its “gist”). This is accomplished without demarcating individual concepts or regions in the image. First, using the output of a generic appearance based concept detection system, a semantic space is formulated, where each axis represents a semantic feature. Next, context models are learned for each of the concepts in the semantic space, using mixtures of Dirichlet distributions. Finally, an image is represented as a vector of posterior concept probabilities under these contextual concept models. It is shown that these posterior probabilities are remarkably noise-free, and an effective model of the contextual relationships between semantic concepts in natural images. This is further demonstrated through an experimental evaluation with respect to two vision tasks, viz. scene classification and image annotation, on benchmark datasets. The results show that, besides quite simple to compute, the proposed context models attain superior performance than state of the art systems in both tasks.

[1]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[2]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[3]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[4]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[5]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[6]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Nuno Vasconcelos,et al.  Scene classification with low-dimensional semantic spaces and weak supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[9]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[11]  A. Oliva,et al.  Diagnostic Colors Mediate Scene Recognition , 2000, Cognitive Psychology.

[12]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[14]  Mubarak Shah,et al.  Scene Modeling Using Co-Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[18]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Bernt Schiele,et al.  A Semantic Typicality Measure for Natural Scene Categorization , 2004, DAGM-Symposium.

[20]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[21]  Pietro Perona,et al.  Mutual Boosting for Contextual Inference , 2003, NIPS.

[22]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[25]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .