What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments

Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks such as free viewing or visual search. Yet, to date, there exists no computational framework that can reliably mimic human gaze behavior in more complex environments and tasks such as urban driving. In addition, benchmark datasets, scoring techniques, and top-down model architectures are not yet well understood. In this paper, we describe new task-dependent approaches for modeling top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a dynamic Bayesian network that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions that are fed from manual annotations of objects in video scenes or by state-of-the-art object detection/recognition algorithms. Evaluating over approximately 3 h (approximately 315 000 eye fixations and 12 000 saccades) of observers playing three video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: 1) simpler classifier-based models also developed here that map a signature of a scene (multimodal information from gist, bottom-up saliency, physical actions, and events) to eye positions; 2) 14 state-of-the-art bottom-up saliency models; and 3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio-temporal visual data compared with the state-of-the-art.

[1]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[2]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[3]  Thomas Martinetz,et al.  A Learned Saliency Predictor for Dynamic Natural Scenes , 2010, ICANN.

[4]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[6]  G. Rizzolatti,et al.  Reorienting attention across the horizontal and vertical meridians: Evidence in favor of a premotor theory of attention , 1987, Neuropsychologia.

[7]  Liming Zhang,et al.  Biological Plausibility of Spectral Domain Approach for Spatiotemporal Visual Saliency , 2008, ICONIP.

[8]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[9]  Peter Kulchyski and , 2015 .

[10]  Dana H. Ballard,et al.  Recognizing Behavior in Hand-Eye Coordination Patterns , 2009, Int. J. Humanoid Robotics.

[11]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[12]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[13]  Antón García-Díaz,et al.  Decorrelation and Distinctiveness Provide with Human-Like Saliency , 2009, ACIVS.

[14]  Bernhard Schölkopf,et al.  A Nonparametric Approach to Bottom-Up Visual Saliency , 2006, NIPS.

[15]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[17]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Yili Liu,et al.  Modeling the Influences of Cyclic Top-Down and Bottom-Up Processes for Reinforcement Learning in Eye Movements , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[19]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[20]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[21]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[22]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[23]  Laurent Itti,et al.  Biologically Inspired Mobile Robot Vision Localization , 2009, IEEE Transactions on Robotics.

[24]  W. Prinz,et al.  Motor learning enhances perceptual judgment: a case for action-perception transfer , 2001, Psychological research.

[25]  Michael Lindenbaum,et al.  Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[27]  T. Poggio,et al.  What and where: A Bayesian inference theory of attention , 2010, Vision Research.

[28]  Shao-Yi Chien,et al.  Baseball and tennis video annotation with temporal structure decomposition , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[29]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[30]  Ali Borji,et al.  An Object-Based Bayesian Framework for Top-Down Visual Attention , 2012, AAAI.

[31]  David N. Lee,et al.  Where we look when we steer , 1994, Nature.

[32]  Chng Eng Siong,et al.  Automatic Sports Video Genre Classification using Pseudo-2D-HMM , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[33]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[34]  Laurent Itti,et al.  Comparison of gist models in rapid scene categorization tasks , 2010 .

[35]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[36]  O. Schwartz,et al.  Visuomotor characterization of eye movements in a drawing task , 2009, Vision Research.

[37]  A. L. Yarbus Eye Movements During Perception of Complex Objects , 1967 .

[38]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Christof Koch,et al.  The focus of expansion in optical flow fields acts as a strong cue for visual attention , 2010 .

[40]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[41]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Dana H. Ballard,et al.  Modeling embodied visual behaviors , 2007, TAP.

[44]  Mary M Hayhoe,et al.  Task and context determine where you look. , 2016, Journal of vision.

[45]  Sharat Chikkerur,et al.  What and where: a Bayesian inference theory of visual attention , 2010 .

[46]  M. Posner,et al.  Orienting of Attention* , 1980, The Quarterly journal of experimental psychology.

[47]  H. Basford,et al.  Optimal eye movement strategies in visual search , 2005 .

[48]  Laurent Itti,et al.  Congruence between model and human attention reveals unique signatures of critical visual events , 2007, NIPS.

[49]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[50]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[51]  Ali Borji,et al.  Computational Modeling of Top-down Visual Attention in Interactive Environments , 2011, BMVC.

[52]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[53]  Dana H. Ballard,et al.  Eye Movements for Reward Maximization , 2003, NIPS.

[54]  Gregory J Zelinsky,et al.  Effects of target typicality on categorical search. , 2014, Journal of vision.

[55]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  G. Rizzolatti,et al.  Action for perception: a motor-visual attentional effect. , 1999, Journal of experimental psychology. Human perception and performance.

[57]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[58]  M. Hayhoe Advances in Relating Eye Movements and Cognition. , 2004, Infancy : the official journal of the International Society on Infant Studies.

[59]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[60]  Athanasios V. Vasilakos,et al.  Dynamic Intelligent Lighting for Directing Visual Attention in Interactive 3-D Scenes , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[61]  Nicolas Pugeault,et al.  Learning Pre-attentive Driving Behaviour from Holistic Visual Features , 2010, ECCV.

[62]  W. Prinz,et al.  Perceptual resonance: action-induced modulation of perception , 2007, Trends in Cognitive Sciences.

[63]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[64]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[65]  Kunio Kashino,et al.  A stochastic model of selective visual attention with a dynamic Bayesian network , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[66]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[67]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[68]  D. Ballard,et al.  Memory Representations in Natural Tasks , 1995, Journal of Cognitive Neuroscience.

[69]  T. Duckett VOCUS : A Visual Attention System for Object Detection and Goal-directed Search , 2010 .

[70]  N Parikh,et al.  Saliency-based image processing for retinal prostheses , 2010, Journal of neural engineering.

[71]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[72]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[73]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[74]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[75]  William D. Smart,et al.  A POMDP Model of Eye-Hand Coordination , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[76]  Nuno Vasconcelos,et al.  The discriminant center-surround hypothesis for bottom-up saliency , 2007, NIPS.

[77]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[78]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[79]  Erik D. Reichle,et al.  The E-Z Reader model of eye-movement control in reading: Comparisons to other models , 2003, Behavioral and Brain Sciences.

[80]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[81]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[82]  B. Scholl Objects and attention: the state of the art , 2001, Cognition.

[83]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[84]  Javier R. Movellan,et al.  Optimal scanning for faster object detection , 2009, CVPR.

[85]  Ali Borji,et al.  Probabilistic learning of task-specific visual attention , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  S. Ullman Visual routines , 1984, Cognition.

[87]  Gert Kootstra,et al.  Paying Attention to Symmetry , 2008, BMVC.