Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention

A critical function in both machine vision and biological vision systems is attentional selection of scene regions worthy of further analysis by higher-level processes such as object recognition. Here we present the first model of spatial attention that (1) can be applied to arbitrary static and dynamic image sequences with interactive tasks and (2) combines a general computational implementation of both bottom-up (BU) saliency and dynamic top-down (TD) task relevance; the claimed novelty lies in the combination of these elements and in the fully computational nature of the model. The BU component computes a saliency map from 12 low-level multi-scale visual features. The TD component computes a low-level signature of the entire image, and learns to associate different classes of signatures with the different gaze patterns recorded from human subjects performing a task of interest. We measured the ability of this model to predict the eye movements of people playing contemporary video games. We found that the TD model alone predicts where humans look about twice as well as does the BU model alone; in addition, a combined BU*TD model performs significantly better than either individual component. Qualitatively, the combined model predicts some easy-to-describe but hard-to-compute aspects of attentional selection, such as shifting attention leftward when approaching a left turn along a racing track. Thus, our study demonstrates the advantages of integrating BU factors derived from a saliency map and TD factors learned from image and task contexts in predicting where humans look while performing complex visually-guided behavior.

[1]  A. L. Yarbus Eye Movements During Perception of Complex Objects , 1967 .

[2]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[3]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[4]  D. Ballard,et al.  Memory Representations in Natural Tasks , 1995, Journal of Cognitive Neuroscience.

[5]  V. Tosi,et al.  Scanning eye movements made when viewing film: preliminary observations. , 1997, The International journal of neuroscience.

[6]  D. S. Wooding,et al.  Fixation Patterns Made during Brief Examination of Two-Dimensional Images , 1997, Perception.

[7]  Gerhard Krieger,et al.  Investigation of a sensorimotor system for saccadic scene analysis: an integrated approach , 1998 .

[8]  J. Henderson,et al.  High-level scene perception. , 1999, Annual review of psychology.

[9]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[10]  P Reinagel,et al.  Natural scene statistics at the centre of gaze. , 1999, Network.

[11]  S. Vogt Looking at Paintings: Patterns of Eye Movements in Artistically Nave and Sophisticated Subjects , 1999, Leonardo.

[12]  Ronald A. Rensink The Dynamic Representation of Scenes , 2000 .

[13]  Claudio M. Privitera,et al.  Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Christof Koch,et al.  Feature combination strategies for saliency-based visual attention systems , 2001, J. Electronic Imaging.

[15]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[16]  Rajesh P. N. Rao,et al.  Eye movements in iconic visual search , 2002, Vision Research.

[17]  Jochen Triesch,et al.  Vision in natural and virtual environments , 2002, ETRA.

[18]  P. Perona,et al.  Rapid natural scene categorization in the near absence of attention , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Bryan Reimer,et al.  On-road driver eye movement tracking using head-mounted devices , 2002, ETRA.

[20]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[21]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[22]  Mary M Hayhoe,et al.  Visual memory and motor planning in a natural task. , 2003, Journal of vision.

[23]  Laurent Itti,et al.  Realistic avatar eye and head animation using a neurobiological model of visual attention , 2004, SPIE Optics + Photonics.

[24]  M. Hayhoe Advances in Relating Eye Movements and Cognition. , 2004, Infancy : the official journal of the International Society on Infant Studies.

[25]  D. Munoz,et al.  Look away: the anti-saccade task and the voluntary control of eye movement , 2004, Nature Reviews Neuroscience.

[26]  Derrick J. Parkhurst,et al.  Texture contrast attracts overt visual attention in natural scenes , 2004, The European journal of neuroscience.

[27]  J. Theeuwes,et al.  The role of stimulus-driven and goal-driven control in saccadic visual selection. , 2004, Journal of experimental psychology. Human perception and performance.

[28]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[29]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[30]  J. Bailenson,et al.  Digital Chameleons , 2005, Psychological science.

[31]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Simone Frintrop,et al.  Goal-Directed Search with a Top-Down Modulated Computational Attention System , 2005, DAGM-Symposium.

[33]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[34]  H. Basford,et al.  Optimal eye movement strategies in visual search , 2005 .

[35]  E. Peli,et al.  Scanpaths of motion sequences : where people look when watching movies , 2005 .

[36]  Laurent Itti,et al.  The role of memory in guiding attention during natural vision. , 2006, Journal of vision.

[37]  M. Pomplun Saccadic selectivity in complex visual search displays , 2006, Vision Research.

[38]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[39]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .