Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object's relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearance-based model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba's proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems.

[1]  Douglas L. Medin,et al.  Context theory of classification learning. , 1978 .

[2]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[3]  K. Lempert,et al.  CONDENSED 1,3,5-TRIAZEPINES - IV THE SYNTHESIS OF 2,3-DIHYDRO-1H-IMIDAZO-[1,2-a] [1,3,5] BENZOTRIAZEPINES , 1983 .

[4]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[5]  Thomas M. Strat,et al.  Context-Based Vision: Recognizing Objects Using Information from Both 2D and 3D Imagery , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  S. Ullman,et al.  Spatial Context in Recognition , 1996, Perception.

[7]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[8]  Rupert Paget,et al.  Texture synthesis via a noncausal nonparametric multiscale Markov random field , 1998, IEEE Trans. Image Process..

[9]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  Shimon Edelman,et al.  Representation, similarity, and the chorus of prototypes , 1993, Minds and Machines.

[11]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[12]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[14]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[15]  Antonio Torralba,et al.  Object Recognition by Scene Alignment , 2007, NIPS.

[16]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Tsuhan Chen,et al.  From appearance to context-based recognition: Dense labeling in small images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Alexei A. Efros,et al.  Recognition by association via learning per-exemplar distances , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Moshe Bar,et al.  The proactive brain: memory for predictions , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.