What and where: a Bayesian inference theory of visual attention

In the theoretical framework described in this thesis, attention is part of the inference process that solves the visual recognition problem of what is where. The theory proposes a computational role for attention and leads to a model that predicts some of its main properties at the level of psychophysics and physiology. In our approach, the main goal of the visual system is to infer the identity and the position of objects in visual scenes: spatial attention emerges as a strategy to reduce the uncertainty in shape information while feature-based attention reduces the uncertainty in spatial information. Featural and spatial attention represent two distinct modes of a computational process solving the problem of recognizing and localizing objects, especially in difficult recognition tasks such as in cluttered natural scenes. We describe a specific computational model and relate it to the known functional anatomy of attention. We show that several well-known attentional phenomena—including bottom-up pop-out effects, multiplicative modulation of neuronal tuning curves and shift in contrast responses—emerge naturally as predictions of the model. We also show that the bayesian model predicts well human eye fixations (considered as a proxy for shifts of attention) in natural scenes. Finally, we demonstrate that the same model, used to modulate information in an existing feedforward model of the ventral stream, improves its object recognition performance in clutter. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Dileep George,et al.  How the brain might work: a hierarchical and temporal model for learning and recognition , 2008 .

[2]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[3]  Daniel Kersten,et al.  Bayesian models of object perception , 2003, Current Opinion in Neurobiology.

[4]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[6]  Winrich A. Freiwald,et al.  Attention to Surfaces Modulates Motion Processing in Extrastriate Area MT , 2007, Neuron.

[7]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[8]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[9]  John K. Tsotsos,et al.  Neurobiology of Attention , 2005 .

[10]  John H. R. Maunsell,et al.  Feature-based attention in visual cortex , 2006, Trends in Neurosciences.

[11]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[12]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[13]  L. Wang,et al.  Neuronal Activity in the Lateral Intraparietal Area and Spatial Attention , 2003 .

[14]  D. Hubel,et al.  Receptive fields of single neurones in the cat's striate cortex , 1959, The Journal of physiology.

[15]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[16]  Jeremy M. Wolfe,et al.  Guided Search 4.0: Current Progress With a Model of Visual Search , 2007, Integrated Models of Cognitive Systems.

[17]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[18]  J. Duncan Target and nontarget grouping in visual search , 1995, Perception & psychophysics.

[19]  R. Desimone,et al.  Selective attention gates visual processing in the extrastriate cortex. , 1985, Science.

[20]  Karl J. Friston Learning and inference in the brain , 2003, Neural Networks.

[21]  Kunihiko Fukushima,et al.  Cognitron: A self-organizing multilayered neural network , 1975, Biological Cybernetics.

[22]  Eero P. Simoncelli,et al.  A model of neuronal responses in visual area MT , 1998, Vision Research.

[23]  John F. Kalaska,et al.  Computational neuroscience : theoretical insights into brain function , 2007 .

[24]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  G. Orban,et al.  Responses of macaque inferior temporal neurons to overlapping shapes. , 1997, Cerebral cortex.

[26]  D. George,et al.  A hierarchical Bayesian model of invariant pattern recognition in the visual cortex , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[27]  T. Poggio,et al.  Are Cortical Models Really Bound by the “Binding Problem”? , 1999, Neuron.

[28]  M. Tarr,et al.  Visual Object Recognition , 1996, ISTCS.

[29]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  C. Koch,et al.  Visual Selective Behavior Can Be Triggered by a Feed-Forward Process , 2003, Journal of Cognitive Neuroscience.

[31]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[32]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[33]  C. Koch,et al.  Towards a neurobiological theory of consciousness , 1990 .

[34]  J. Hegdé,et al.  Selectivity for Complex Shapes in Primate Visual Area V2 , 2000, The Journal of Neuroscience.

[35]  T. Poggio,et al.  A model of V4 shape selectivity and invariance. , 2007, Journal of neurophysiology.

[36]  R. Rosenholtz A simple saliency model predicts a number of motion popout phenomena , 1999, Vision Research.

[37]  Stefan Treue,et al.  Feature-based attention influences motion processing gain in macaque visual cortex , 1999, Nature.

[38]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[39]  Daniel P. Huttenlocher,et al.  Spatial priors for part-based recognition using statistical models , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  S. Grossberg How does the cerebral cortex work? Learning, attention, and grouping by the laminar circuits of visual cortex. , 1999, Spatial vision.

[41]  S. Grossberg How does the cerebral cortex work? Development, learning, attention, and 3-D vision by laminar circuits of visual cortex. , 2003, Behavioral and cognitive neuroscience reviews.

[42]  E. Rolls,et al.  A Neurodynamical cortical model of visual attention and invariant object recognition , 2004, Vision Research.

[43]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[44]  D. Broadbent Perception and communication , 1958 .

[45]  J. Movshon,et al.  Linearity and Normalization in Simple Cells of the Macaque Primary Visual Cortex , 1997, The Journal of Neuroscience.

[46]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[47]  Keiji Tanaka,et al.  Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. , 1994, Journal of neurophysiology.

[48]  R. Desimone,et al.  Attention Increases Sensitivity of V4 Neurons , 2000, Neuron.

[49]  E. Miller,et al.  Top-Down Versus Bottom-Up Control of Attention in the Prefrontal and Posterior Parietal Cortices , 2007, Science.

[50]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[51]  Kunihiko Fukushima,et al.  Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Visual Pattern Recognition , 1982 .

[52]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[53]  T. Nipkow,et al.  Probabilistic Models , 2004 .

[54]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[55]  Shimon Ullman,et al.  Cortical Circuitry Implementing Graphical Models , 2009, Neural Computation.

[56]  D. Mumford On the computational architecture of the neocortex , 2004, Biological Cybernetics.

[57]  Alexandre Pouget,et al.  Exact Inferences in a Neural Implementation of a Hidden Markov Model , 2007, Neural Computation.

[58]  Simon J. Thorpe,et al.  Ultra-Rapid Scene Categorization with a Wave of Spikes , 2002, Biologically Motivated Computer Vision.

[59]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[60]  N. Logothetis,et al.  Shape representation in the inferior temporal cortex of monkeys , 1995, Current Biology.

[61]  D. Heeger,et al.  The Normalization Model of Attention , 2009, Neuron.

[62]  D. C. Essen,et al.  Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. , 1996, Journal of neurophysiology.

[63]  Nuno Vasconcelos,et al.  Bottom-up saliency is a discriminant process , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[64]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[65]  Tomaso Poggio,et al.  Trade-Off between Object Selectivity and Tolerance in Monkey Inferotemporal Cortex , 2007, The Journal of Neuroscience.

[66]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[67]  Joseph F. Murray,et al.  Visual Recognition and Inference Using Dynamic Overcomplete Sparse Learning , 2007, Neural Computation.

[68]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[69]  Shimon Ullman,et al.  Image interpretation by a single bottom-up top-down cycle , 2008, Proceedings of the National Academy of Sciences.

[70]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[71]  D. Luce,et al.  Detection and Recognition " ' , 2006 .

[72]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[73]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[74]  Yuanzhen Li,et al.  Feature congestion: a measure of display clutter , 2005, CHI.

[75]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[76]  D Mumford,et al.  On the computational architecture of the neocortex. II. The role of cortico-cortical loops. , 1992, Biological cybernetics.

[77]  David J. Freedman,et al.  Categorical representation of visual stimuli in the primate prefrontal cortex. , 2001, Science.

[78]  Edmund T. Rolls,et al.  Models of invariant object recognition , 2001 .

[79]  Edward H. Adelson,et al.  Motion illusions as optimal percepts , 2002, Nature Neuroscience.

[80]  Thomas Serre,et al.  A Theory of Object Recognition: Computations and Circuits in the Feedforward Path of the Ventral Stream in Primate Visual Cortex , 2005 .

[81]  L. Itti,et al.  A neural model combining attentional orienting to object recognition: preliminary explorations on the interplay between where and what , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[82]  Keiji Tanaka,et al.  Inferotemporal cortex and object vision. , 1996, Annual review of neuroscience.

[83]  Sophie Denève,et al.  Bayesian Spiking Neurons I: Inference , 2008, Neural Computation.

[84]  D. Pelli,et al.  The uncrowded window of object recognition , 2008, Nature Neuroscience.

[85]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[86]  T. Womelsdorf,et al.  Dynamic shifts of visual receptive fields in cortical area MT by spatial attention , 2006, Nature Neuroscience.

[87]  J. Daugman Two-dimensional spectral analysis of cortical receptive field profiles , 1980, Vision Research.

[88]  Helmut Hillebrand,et al.  Top-down versus bottom-up control of autotrophic biomass—a meta-analysis on experiments with periphyton , 2002, Journal of the North American Benthological Society.

[89]  A. Treisman The binding problem , 1996, Current Opinion in Neurobiology.

[90]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[91]  N. Kanwisher,et al.  Visual attention: Insights from brain imaging , 2000, Nature Reviews Neuroscience.

[92]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[93]  Katherine M. Armstrong,et al.  Selective gating of visual signals by microstimulation of frontal cortex , 2003, Nature.

[94]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[95]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[96]  PoggioTomaso,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007 .

[97]  Stanley M. Bileschi,et al.  Street Scenes: towards scene understanding in still images , 2006 .

[98]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[99]  Robert Desimone,et al.  Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4 , 2005, Science.

[100]  M. Goldberg,et al.  Space and attention in parietal cortex. , 1999, Annual review of neuroscience.

[101]  Eero P. Simoncelli,et al.  How MT cells analyze the motion of visual patterns , 2006, Nature Neuroscience.

[102]  Rajesh P. N. Rao Bayesian Computation in Recurrent Neural Circuits , 2004, Neural Computation.

[103]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[104]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[105]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[106]  Rajesh P. N. Rao,et al.  Bayesian inference and attentional modulation in the visual cortex , 2005, Neuroreport.

[107]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[108]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[109]  R. Desimone,et al.  Visual properties of neurons in area V4 of the macaque: sensitivity to stimulus form. , 1987, Journal of neurophysiology.

[110]  R. Zemel,et al.  Statistical models and sensory attention , 1999 .

[111]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[112]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[113]  L. Zhaoping,et al.  A theory of a saliency map in primary visual cortex (V1) tested by psychophysics of colour–orientation interference in texture segmentation , 2006 .

[114]  Leslie G. Ungerleider,et al.  ‘What’ and ‘where’ in the human brain , 1994, Current Opinion in Neurobiology.

[115]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[116]  A. Yuille,et al.  Object perception as Bayesian inference. , 2004, Annual review of psychology.

[117]  M. Posner,et al.  Components of visual orienting , 1984 .

[118]  Nuno Vasconcelos,et al.  Integrated learning of saliency, complex features, and object detectors from cluttered scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[119]  Jay Hegdé,et al.  How Selective Are V1 Cells for Pop-Out Stimuli? , 2003, The Journal of Neuroscience.

[120]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[121]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[122]  R. Desimone,et al.  Competitive Mechanisms Subserve Attention in Macaque Areas V2 and V4 , 1999, The Journal of Neuroscience.

[123]  Thomas Serre,et al.  A feedforward architecture accounts for rapid categorization , 2007, Proceedings of the National Academy of Sciences.

[124]  F. van der Velde,et al.  From Knowing What to Knowing Where: Modeling Object-Based Attention with Feedback Disinhibition of Activation , 2001, Journal of Cognitive Neuroscience.

[125]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[126]  R. Desimone Visual attention mediated by biased competition in extrastriate visual cortex. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[127]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[128]  Peter Dayan,et al.  Inference, Attention, and Decision in a Bayesian Neural Architecture , 2004, NIPS.

[129]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[130]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[131]  Thomas Dean,et al.  A Computational Model of the Cerebral Cortex , 2005, AAAI.

[132]  Wolfgang Maass,et al.  Belief Propagation in Networks of Spiking Neurons , 2009, Neural Computation.

[133]  E. DeYoe,et al.  Graded effects of spatial and featural attention on human area MT and associated motion processing areas. , 1997, Journal of neurophysiology.

[134]  R. K. Simpson Nature Neuroscience , 2022 .

[135]  S. Ullman Visual routines , 1984, Cognition.

[136]  Minami Ito,et al.  Representation of Angles Embedded within Contour Stimuli in Area V2 of Macaque Monkeys , 2004, The Journal of Neuroscience.

[137]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[138]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[139]  D. Hubel,et al.  Receptive fields and functional architecture of monkey striate cortex , 1968, The Journal of physiology.

[140]  J SondikEdward The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon , 1978 .

[141]  C. Koch,et al.  Some reflections on visual awareness. , 1990, Cold Spring Harbor symposia on quantitative biology.

[142]  Edmund T. Rolls,et al.  A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures , 2000, Neural Computation.

[143]  D. Perrett,et al.  Time course of neural responses discriminating different views of the face and head. , 1992, Journal of neurophysiology.

[144]  C. Connor,et al.  Shape representation in area V4: position-specific tuning for boundary conformation. , 2001, Journal of neurophysiology.

[145]  Franz Josef Radermacher,et al.  Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Judea Pearl) , 1990, SIAM Rev..

[146]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[147]  Rajesh P. N. Rao,et al.  Probabilistic Models of the Brain: Perception and Neural Function , 2002 .

[148]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[149]  S. Treue,et al.  Attentional Modulation Strength in Cortical Area MT Depends on Stimulus Contrast , 2002, Neuron.

[150]  Christof Koch,et al.  Attention in hierarchical models of object recognition. , 2007, Progress in brain research.

[151]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[152]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[153]  Eero P. Simoncelli,et al.  Modeling Surround Suppression in V1 Neurons with a Statistically Derived Normalization Model , 1998, NIPS.

[154]  Y. Amit,et al.  An integrated network for invariant visual detection and recognition , 2003, Vision Research.

[155]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[156]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[157]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[158]  S. Hochstein,et al.  View from the Top Hierarchies and Reverse Hierarchies in the Visual System , 2002, Neuron.

[159]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[160]  David I. Perrett,et al.  Neurophysiology of shape processing , 1993, Image Vis. Comput..

[161]  John K. Tsotsos Limited Capacity of Any Realizable Perceptual System Is a Sufficient Reason for Attentive Behavior , 1997, Consciousness and Cognition.

[162]  Kunihiko Fukushima,et al.  A neural network model for selective attention in visual pattern recognition , 1986, Biological Cybernetics.

[163]  Carrie J. McAdams,et al.  Effects of Attention on Orientation-Tuning Functions of Single Neurons in Macaque Cortical Area V4 , 1999, The Journal of Neuroscience.

[164]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[165]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .