An Object-Based Bayesian Framework for Top-Down Visual Attention

We introduce a new task-independent framework to model top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions which are fed from manual annotations of objects in video scenes or by state-of-theart object detection models. Evaluating over ∼3 hours (appx. 315;000 eye fixations and 12;600 saccades) of observers playing 3 video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: 1) simpler classifier-based models also developed here that map a signature of a scene (multimodal information from gist, bottom-up saliency, physical actions, and events) to eye positions, 2) 14 state-of-the-art bottom-up saliency models, and 3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio-temporal visual data.

[1]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[2]  Liming Zhang,et al.  Biological Plausibility of Spectral Domain Approach for Spatiotemporal Visual Saliency , 2008, ICONIP.

[3]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[4]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[5]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[6]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[7]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Dana H. Ballard,et al.  Recognizing Behavior in Hand-Eye Coordination Patterns , 2009, Int. J. Humanoid Robotics.

[10]  Alexander Toet,et al.  Computational versus Psychophysical Bottom-Up Image Saliency: A Comparative Evaluation Study , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[12]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[13]  S. Ullman Visual routines , 1984, Cognition.

[14]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[15]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[18]  Erik D. Reichle,et al.  The E-Z Reader model of eye-movement control in reading: Comparisons to other models , 2003, Behavioral and Brain Sciences.

[19]  Dana H. Ballard,et al.  Eye Movements for Reward Maximization , 2003, NIPS.

[20]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  M. Hayhoe,et al.  Look-ahead fixations: anticipatory eye movements in natural tasks , 2007, Experimental Brain Research.

[22]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[23]  David N. Lee,et al.  Where we look when we steer , 1994, Nature.

[24]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[25]  O. Schwartz,et al.  Visuomotor characterization of eye movements in a drawing task , 2009, Vision Research.

[26]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Christopher M. Brown,et al.  Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[28]  Jitendra Malik,et al.  An Information Maximization Model of Eye Movements , 2004, NIPS.

[29]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[30]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[31]  Javier R. Movellan,et al.  Optimal scanning for faster object detection , 2009, CVPR.

[32]  D. Ballard,et al.  Memory Representations in Natural Tasks , 1995, Journal of Cognitive Neuroscience.

[33]  Thomas S. Huang,et al.  Image processing , 1971 .

[34]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[35]  Antón García-Díaz,et al.  Decorrelation and Distinctiveness Provide with Human-Like Saliency , 2009, ACIVS.

[36]  William D. Smart,et al.  A POMDP Model of Eye-Hand Coordination , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[37]  Robert A. Marino,et al.  Free viewing of dynamic stimuli by humans and monkeys. , 2009, Journal of vision.

[38]  H. Basford,et al.  Optimal eye movement strategies in visual search , 2005 .