Modeling the influence of action on spatial attention in visual interactive environments

A large number of studies have been reported on top-down influences of visual attention. However, less progress have been made in understanding and modeling its mechanisms in real-world tasks. In this paper, we propose an approach for learning spatial attention taking into account influences of physical actions on top-down attention. For this purpose, we focus on interactive visual environments (video games) which are modest real-world simulations, where a player has to attend to certain aspects of visual stimuli and perform actions to achieve a goal. The basic idea is to learn a mapping from current mental state of the game player, represented by past actions and observations, to its gaze fixation. A data-driven approach is followed where we train a model from the data of some players and test it over a new subject. In particular, two contributions this paper makes are: 1) employing multi-modal information including mean eye position, gist of a scene, physical actions, bottom-up saliency, and tagged events for state representation and 2) analysis of different methods of combining bottom-up and top-down influences. Comparing with other top-down task-driven and bottom-up spatio-temporal models, our approach shows higher NSS scores in predicting eye positions.

[1]  Nicolas Pugeault,et al.  Learning Pre-attentive Driving Behaviour from Holistic Visual Features , 2010, ECCV.

[2]  W. Prinz,et al.  Perceptual resonance: action-induced modulation of perception , 2007, Trends in Cognitive Sciences.

[3]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[4]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[5]  G. Rizzolatti,et al.  Reorienting attention across the horizontal and vertical meridians: Evidence in favor of a premotor theory of attention , 1987, Neuropsychologia.

[6]  Thomas Martinetz,et al.  A Learned Saliency Predictor for Dynamic Natural Scenes , 2010, ICANN.

[7]  W. Prinz,et al.  Motor learning enhances perceptual judgment: a case for action-perception transfer , 2001, Psychological research.

[8]  T. Poggio,et al.  What and where: A Bayesian inference theory of attention , 2010, Vision Research.

[9]  David N. Lee,et al.  Where we look when we steer , 1994, Nature.

[10]  M. Hayhoe Advances in Relating Eye Movements and Cognition. , 2004, Infancy : the official journal of the International Society on Infant Studies.

[11]  A. L. Yarbus Eye Movements During Perception of Complex Objects , 1967 .

[12]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[13]  Bernhard Schölkopf,et al.  A Nonparametric Approach to Bottom-Up Visual Saliency , 2006, NIPS.

[14]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  G. Rizzolatti,et al.  Action for perception: a motor-visual attentional effect. , 1999 .

[16]  D. Ballard,et al.  Memory Representations in Natural Tasks , 1995, Journal of Cognitive Neuroscience.

[17]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[18]  Sharat Chikkerur,et al.  What and where: a Bayesian inference theory of visual attention , 2010 .

[19]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[20]  Dana H. Ballard,et al.  Eye Movements for Reward Maximization , 2003, NIPS.

[21]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  G. Rizzolatti,et al.  Action for perception: a motor-visual attentional effect. , 1999, Journal of experimental psychology. Human perception and performance.

[23]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[24]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[25]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[26]  Kunio Kashino,et al.  A stochastic model of human visual attention with a dynamic Bayesian network , 2010, ArXiv.

[27]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[28]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[29]  Laurent Itti,et al.  Congruence between model and human attention reveals unique signatures of critical visual events , 2007, NIPS.