What are the Visual Features Underlying Rapid Object Recognition?

Research progress in machine vision has been very significant in recent years. Robust face detection and identification algorithms are already readily available to consumers, and modern computer vision algorithms for generic object recognition are now coping with the richness and complexity of natural visual scenes. Unlike early vision models of object recognition that emphasized the role of figure-ground segmentation and spatial information between parts, recent successful approaches are based on the computation of loose collections of image features without prior segmentation or any explicit encoding of spatial relations. While these models remain simplistic models of visual processing, they suggest that, in principle, bottom-up activation of a loose collection of image features could support the rapid recognition of natural object categories and provide an initial coarse visual representation before more complex visual routines and attentional mechanisms take place. Focusing on biologically plausible computational models of (bottom-up) pre-attentive visual recognition, we review some of the key visual features that have been described in the literature. We discuss the consistency of these feature-based representations with classical theories from visual psychology and test their ability to account for human performance on a rapid object categorization task.

[1]  Shimon Ullman,et al.  Image interpretation by a single bottom-up top-down cycle , 2008, Proceedings of the National Academy of Sciences.

[2]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[3]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[4]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[5]  Liang-Tien Chia,et al.  Local features are not lonely – Laplacian sparse coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[7]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[8]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[9]  Stephen Grossberg,et al.  On the road to invariant recognition: Explaining tradeoff and morph properties of cells in inferotemporal cortex using multiple-scale task-sensitive attentive learning , 2011, Neural Networks.

[10]  P. Perona,et al.  Rapid natural scene categorization in the near absence of attention , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[12]  Cordelia Schmid,et al.  Spatial pyramid matching , 2009 .

[13]  R. Rosenholtz,et al.  A summary statistic representation in peripheral vision explains visual search. , 2009, Journal of vision.

[14]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[15]  N. Logothetis,et al.  Psychophysical and physiological evidence for viewer-centered object representations in the primate. , 1995, Cerebral cortex.

[16]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Timothée Masquelier,et al.  Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity , 2007, PLoS Comput. Biol..

[19]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[22]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[23]  Simon J. Thorpe,et al.  Low-Level Cues and Ultra-Fast Face Detection , 2011, Front. Psychology.

[24]  Sanja Fidler,et al.  Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  G. Rousselet,et al.  How do amplitude spectra influence rapid animal detection? , 2009, Vision Research.

[26]  R. VanRullen,et al.  Faces in the cloud: Fourier power spectrum biases ultrarapid face detection. , 2008, Journal of vision.

[27]  Eero P. Simoncelli,et al.  How MT cells analyze the motion of visual patterns , 2006, Nature Neuroscience.

[28]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Tomaso Poggio,et al.  CNS: a GPU-based framework for simulating cortically-organized networks , 2010 .

[30]  Garrett T. Kenyon,et al.  Large-scale functional models of visual cortex for remote sensing , 2009, 2009 IEEE Applied Imagery Pattern Recognition Workshop (AIPR 2009).

[31]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[32]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[33]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  T. Poggio,et al.  What and where: A Bayesian inference theory of attention , 2010, Vision Research.

[35]  Guillaume A. Rousselet,et al.  Parallel processing in high-level categorization of natural images , 2002, Nature Neuroscience.

[36]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[37]  Nicolas Pinto,et al.  Comparing state-of-the-art visual features on invariant object recognition tasks , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[38]  Thomas S. Huang,et al.  Efficient Highly Over-Complete Sparse Coding Using a Mixture Model , 2010, ECCV.

[39]  S. Thorpe,et al.  The Time Course of Visual Processing: From Early Perception to Decision-Making , 2001, Journal of Cognitive Neuroscience.

[40]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[41]  C. Koch,et al.  Visual Selective Behavior Can Be Triggered by a Feed-Forward Process , 2003, Journal of Cognitive Neuroscience.

[42]  Arnold W. M. Smeulders,et al.  A Biologically Plausible Model for Rapid Natural Scene Identification , 2009, NIPS.

[43]  I. Biederman,et al.  Representation of regular and irregular shapes in macaque inferotemporal cortex. , 2005, Cerebral cortex.

[44]  Nicolás Pinto Forward engineering object recognition : a scalable approach , 2011 .

[45]  A. Oliva,et al.  Canonical Visual Size for Real-world Objects , 2010 .

[46]  Nicolas Pinto,et al.  Why is Real-World Visual Object Recognition Hard? , 2008, PLoS Comput. Biol..

[47]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[48]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  M. Potter,et al.  Recognition memory for a rapid sequence of pictures. , 1969, Journal of experimental psychology.

[50]  Zhuowen Tu,et al.  Detecting Object Boundaries Using Low-, Mid-, and High-level Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Rufin VanRullen,et al.  The power of the feed-forward sweep , 2008, Advances in cognitive psychology.

[52]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[53]  Eero P. Simoncelli,et al.  Metamers of the ventral stream , 2011, Nature Neuroscience.

[54]  G. Rousselet,et al.  Is it an animal? Is it a human face? Fast processing in upright and inverted natural scenes. , 2003, Journal of vision.

[55]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[56]  Stephen Grossberg,et al.  How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex? , 2011, Neural Networks.

[57]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[58]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  E. Rolls High-level vision: Object recognition and visual cognition, Shimon Ullman. MIT Press, Bradford (1996), ISBN 0 262 21013 4 , 1997 .

[60]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[61]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[62]  Antonio Torralba,et al.  LabelMe: Online Image Annotation and Applications , 2010, Proceedings of the IEEE.

[63]  R. Rosenholtz,et al.  A summary-statistic representation in peripheral vision explains visual crowding. , 2009, Journal of vision.

[64]  Steven W. Zucker,et al.  An improved model for contour completion in V1 using learned feature correlation statistics , 2010 .

[65]  J. Tsien,et al.  A Hierarchical Probabilistic Model for Rapid Object Categorization in Natural Scenes , 2011, PloS one.

[66]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[67]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[68]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[69]  David D. Cox,et al.  What response properties do individual neurons need to underlie position and clutter "invariant" object recognition? , 2009, Journal of neurophysiology.

[70]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[71]  A. Treisman,et al.  Perception of objects in natural scenes: is it really attention free? , 2005, Journal of experimental psychology. Human perception and performance.

[72]  Zhenghao Chen,et al.  On Random Weights and Unsupervised Feature Learning , 2011, ICML.

[73]  Yifei Lu,et al.  Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  S. Grossberg,et al.  How Does the Brain Rapidly Learn and Reorganize View- and Positionally-Invariant Object Representations in Inferior Temporal Cortex? , 2011 .

[75]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[76]  Jan Drewes,et al.  Animal detection in natural scenes: critical features revisited. , 2010, Journal of vision.

[77]  安藤 広志,et al.  20世紀の名著名論:David Marr:Vision:a Computational Investigation into the Human Representation and Processing of Visual Information , 2005 .

[78]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[79]  T. Poggio,et al.  A model of V4 shape selectivity and invariance. , 2007, Journal of neurophysiology.

[80]  Mitchell Melanie Visualizing classification decisions of hierarchical models of cortex , 2010 .

[81]  James R. Bergen,et al.  Parallel versus serial processing in rapid pattern discrimination , 1983, Nature.

[82]  R. VanRullen On second glance: Still no high-level pop-out effect for faces , 2006, Vision Research.

[83]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[84]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[85]  Arnold W. M. Smeulders,et al.  Brain responses strongly correlate with Weibull image statistics when processing natural images. , 2009, Journal of vision.

[86]  Simon J. Thorpe,et al.  Ultra-rapid object detection with saccadic eye movements: Visual processing speed revisited , 2006, Vision Research.

[87]  Sébastien M. Crouzet,et al.  Fast saccades toward faces: face detection in just 100 ms. , 2010, Journal of vision.

[88]  S. Thorpe,et al.  The time course of visual processing: Backward masking and natural scene categorisation , 2005, Vision Research.

[89]  Johan Wagemans,et al.  Encoding of Complexity, Shape, and Curvature by Macaque Infero-Temporal Neurons , 2011, Front. Syst. Neurosci..

[90]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[91]  V. Lamme,et al.  The distinct modes of vision offered by feedforward and recurrent processing , 2000, Trends in Neurosciences.

[92]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[93]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[94]  Geoffrey E. Hinton Learning to represent visual input , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[95]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[96]  J. Gallant,et al.  Spectral receptive field properties explain shape selectivity in area V4. , 2006, Journal of neurophysiology.

[97]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[98]  Stuart Geman,et al.  Invariance and selectivity in the ventral visual pathway , 2006, Journal of Physiology-Paris.

[99]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[100]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[101]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[102]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[103]  Johan Wagemans,et al.  Perceived Shape Similarity among Unfamiliar Objects and the Organization of the Human Object Vision Pathway , 2008, The Journal of Neuroscience.

[104]  Thomas Serre,et al.  A neuromorphic approach to computer vision , 2010, Commun. ACM.

[105]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[106]  Tomaso Poggio,et al.  Models of object recognition , 2000, Nature Neuroscience.

[107]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[108]  Jeremy M. Wolfe,et al.  Guided Search 4.0: Current Progress With a Model of Visual Search , 2007, Integrated Models of Cognitive Systems.

[109]  Tomaso Poggio,et al.  Trade-Off between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex , 2007, The Journal of Neuroscience.

[110]  Simon J Thorpe,et al.  Animals roll around the clock: the rotation invariance of ultrarapid visual processing. , 2006, Journal of vision.

[111]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[112]  Y. Amit,et al.  An integrated network for invariant visual detection and recognition , 2003, Vision Research.

[113]  Nicolas Pinto,et al.  Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[114]  Garrett T. Kenyon,et al.  Comparing Speed-of-Sight studies using rendered vs. natural images , 2010 .

[115]  Eero P. Simoncelli,et al.  Spatiotemporal Elements of Macaque V1 Receptive Fields , 2005, Neuron.

[116]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[117]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[118]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.