Influence of the amount of context learned for improving object classification when simultaneously learning object and contextual cues

Humans use visual context to improve object recognition. Yet, many machine vision algorithms still focus on local object features, discarding surrounding features as unwanted clutter. Here we study the impact of learning contextual cues while training an object classifier. In a new image database with 10 object categories and 28,800 images, objects were presented in contextual or uniform backgrounds. Both the fraction of contextual backgrounds during training and the spatial extent of context were analysed. Local object features and broader context features were extracted by two biologically inspired algorithms, previously used for object and scene classification, respectively: HMAX, applied to a tight window around every object, and a “Gist” algorithm, applied to a larger yet still localized window. The descriptors from both algorithms were combined and processed by a Support Vector Machine. The recognition rate increased from 29%, without contextual cues, to 43% for objects presented in their context.

[1]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[2]  T. Poggio,et al.  Hierarchical models of object recognition in cortex September 23 , 1999 , 1999 .

[3]  Antonio Torralba,et al.  Statistical Context Priming for Object Detection , 2001, ICCV.

[4]  Christof Koch,et al.  Attentional Selection for Object Recognition - A Gentle Way , 2002, Biologically Motivated Computer Vision.

[5]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[6]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[7]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[8]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[9]  Joachim Hertzberg,et al.  Saliency-based object recognition in 3D data , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[10]  Naomi M. Kenner,et al.  How fast can you change your mind? The speed of top-down guidance in visual search , 2004, Vision Research.

[11]  Jodi L. Davenport,et al.  Scene Consistency in Object and Background Perception , 2004, Psychological science.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[14]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[15]  Guillaume A. Rousselet,et al.  Processing scene context: Fast categorization and object interference , 2007, Vision Research.

[16]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[18]  Nick Donnelly,et al.  Nontarget objects can influence perceptual processes during object recognition , 2007, Psychonomic bulletin & review.

[19]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[21]  Guillaume A. Rousselet,et al.  Early interference of context congruence on object processing in rapid visual categorization of natural scenes. , 2008, Journal of vision.

[22]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[24]  Arnold W. M. Smeulders,et al.  What is the spatial extent of an object? , 2009, CVPR.

[25]  Christoph H. Lampert,et al.  Object Localization with Global and Local Context Kernels , 2009, BMVC.

[26]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[27]  Garrison W. Cottrell,et al.  Robust classification of objects, faces, and flowers using natural image statistics , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Antonio Torralba,et al.  Using the forest to see the trees: exploiting context for visual object detection and localization , 2010, CACM.

[29]  Laurent Itti,et al.  A Bayesian model for efficient visual search and recognition , 2010, Vision Research.

[30]  M. Castelhano,et al.  The relative contribution of scene context and target features to visual search in scenes , 2010, Attention, perception & psychophysics.

[31]  T. Poggio,et al.  What and where: A Bayesian inference theory of attention , 2010, Vision Research.

[32]  Michael L. Mack,et al.  Modeling categorization of scenes containing consistent versus inconsistent objects. , 2010, Journal of vision.

[33]  Ales Leonardis,et al.  A framework for visual-context-aware object detection in still images , 2010, Comput. Vis. Image Underst..

[34]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[35]  M. Castelhano,et al.  Scene context influences without scene gist: Eye movements guided by spatial associations in visual search , 2011, Psychonomic bulletin & review.

[36]  R. D. Gordon,et al.  Contextual influences on rapid object categorization in natural scenes , 2011, Brain Research.

[37]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[38]  Ralf Engbert,et al.  The zoom lens of attention: Simulating shuffled versus normal text reading using the SWIFT model , 2012, Visual cognition.

[39]  R. Levy,et al.  The utility of modelling word identification from visual input within models of eye movements in reading , 2012, Visual cognition.