Attentive processing improves object recognition

The human visual system can recognize several thousand object categories irrespective of their position and size. This combination of selectivity and invariance is built up gradually across several stages of visual processing. However, the recognition of multiple objects in cluttered visual scenes presents a difficult problem for human as well as machine vision systems. The human visual system has evolved to perform two stages of visual processing: a pre-attentive parallel processing stage, in which the entire visual field is processed at once and a slow serial attentive processing stage, in which a region of interest in an input image is selected for “specialized” analysis by an attentional spotlight. We argue that this strategy evolved to overcome the limitation of purely feed forward processing in the presence of clutter and crowding. Using a Bayesian model of attention along with a hierarchical model of feed forward recognition on a data set of real world images, we show that this two stage attentive processing can improve recognition in cluttered and crowded conditions.

[1]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, CVPR 2004.

[3]  E. Rolls,et al.  INVARIANT FACE AND OBJECT RECOGNITION IN THE VISUAL SYSTEM , 1997, Progress in Neurobiology.

[4]  Heiko Wersing,et al.  Learning Optimized Features for Hierarchical Models of Invariant Object Recognition , 2003, Neural Computation.

[5]  R. Desimone,et al.  Visual properties of neurons in area V4 of the macaque: sensitivity to stimulus form. , 1987, Journal of neurophysiology.

[6]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[7]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[8]  J. Maunsell,et al.  Feature-based attention in visual cortex , 2006, Trends in Neurosciences.

[9]  A. Oliva,et al.  From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition , 1994 .

[10]  Maximilian Riesenhuber,et al.  Object Recognition in Cortex: Neural Mechanisms, and Possible Roles for Attention , 2005 .

[11]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.

[12]  John K. Tsotsos,et al.  Neurobiology of Attention , 2005 .

[13]  Irving Biederman,et al.  Human Image Understanding , 1989 .

[14]  Michael A. Arbib,et al.  Attention and Scene Understanding , 2005 .

[15]  Jeremy M. Wolfe,et al.  Guided Search 4.0: Current Progress With a Model of Visual Search , 2007, Integrated Models of Cognitive Systems.

[16]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[17]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Keiji Tanaka,et al.  Coding visual images of objects in the inferotemporal cortex of the macaque monkey. , 1991, Journal of neurophysiology.

[19]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[20]  Tomaso Poggio,et al.  Trade-Off between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex , 2007, The Journal of Neuroscience.

[21]  T. Poggio,et al.  A model of V4 shape selectivity and invariance. , 2007, Journal of neurophysiology.

[22]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Tomaso Poggio,et al.  Intracellular measurements of spatial integration and the MAX operation in complex cells of the cat primary visual cortex. , 2004, Journal of neurophysiology.

[24]  Stanley M. Bileschi,et al.  Street Scenes: towards scene understanding in still images , 2006 .

[25]  Yuanzhen Li,et al.  Feature congestion: a measure of display clutter , 2005, CHI.

[26]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[27]  L. Itti,et al.  A neural model combining attentional orienting to object recognition: preliminary explorations on the interplay between where and what , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[28]  Ronald A. Rensink Seeing, sensing, and scrutinizing , 2000, Vision Research.

[29]  Keiji Tanaka,et al.  Inferotemporal cortex and object vision. , 1996, Annual review of neuroscience.

[30]  D C Van Essen,et al.  Shifter circuits: a computational strategy for dynamic aspects of visual processing. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[32]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[33]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[34]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[35]  D. C. Essen,et al.  Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[36]  John F. Kalaska,et al.  Computational neuroscience : theoretical insights into brain function , 2007 .

[37]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[38]  N. Kanwisher,et al.  Visual attention: Insights from brain imaging , 2000, Nature Reviews Neuroscience.

[39]  Rajesh P. N. Rao,et al.  Bayesian Inference and Attentional Modulation in the Visual Cortex Correspondence and Requests for Reprints to Rajesh , 2005 .

[40]  S. Thorpe,et al.  A Limit to the Speed of Processing in Ultra-Rapid Visual Categorization of Novel Natural Scenes , 2001, Journal of Cognitive Neuroscience.

[41]  Minami Ito,et al.  Representation of Angles Embedded within Contour Stimuli in Area V2 of Macaque Monkeys , 2004, The Journal of Neuroscience.

[42]  T. Poggio,et al.  Are Cortical Models Really Bound Review by the "Binding Problem"? , 1999 .

[43]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[44]  B. Tatler,et al.  Yarbus, eye movements, and vision , 2010, i-Perception.

[45]  J. Hegdé,et al.  Selectivity for Complex Shapes in Primate Visual Area V2 , 2000, The Journal of Neuroscience.

[46]  Nicole C. Rust,et al.  Do We Know What the Early Visual System Does? , 2005, The Journal of Neuroscience.

[47]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[48]  M. Goldberg,et al.  Neuronal Activity in the Lateral Intraparietal Area and Spatial Attention , 2003, Science.

[49]  L. Stark,et al.  Scanpaths in Eye Movements during Pattern Perception , 1971, Science.

[50]  Thomas Serre,et al.  An integrated model of visual attention using shape-based features , 2009 .

[51]  Leslie G. Ungerleider,et al.  ‘What’ and ‘where’ in the human brain , 1994, Current Opinion in Neurobiology.

[52]  R. Desimone Visual attention mediated by biased competition in extrastriate visual cortex. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[53]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[54]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[55]  C. Connor,et al.  Shape representation in area V4: position-specific tuning for boundary conformation. , 2001, Journal of neurophysiology.

[56]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.