论文信息 - An Object-Based Bayesian Framework for Top-Down Visual Attention

An Object-Based Bayesian Framework for Top-Down Visual Attention

We introduce a new task-independent framework to model top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a Dynamic Bayesian Network (DBN) that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions which are fed from manual annotations of objects in video scenes or by state-of-theart object detection models. Evaluating over ∼3 hours (appx. 315;000 eye fixations and 12;600 saccades) of observers playing 3 video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: 1) simpler classifier-based models also developed here that map a signature of a scene (multimodal information from gist, bottom-up saliency, physical actions, and events) to eye positions, 2) 14 state-of-the-art bottom-up saliency models, and 3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio-temporal visual data.

Ali Borji | Laurent Itti | Dicky N. Sihite | L. Itti | A. Borji

[1] L. Itti,et al. Modeling the influence of task on attention , 2005, Vision Research.

[2] Liming Zhang,et al. Biological Plausibility of Spectral Domain Approach for Spatiotemporal Visual Saliency , 2008, ICONIP.

[3] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[4] Peyman Milanfar,et al. Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[5] Benjamin W Tatler,et al. The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[6] M. Hayhoe,et al. In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[7] Liming Zhang,et al. A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[8] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Dana H. Ballard,et al. Recognizing Behavior in Hand-Eye Coordination Patterns , 2009, Int. J. Humanoid Robotics.

[10] Alexander Toet,et al. Computational versus Psychophysical Bottom-Up Image Saliency: A Comparative Evaluation Study , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Tim K Marks,et al. SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[12] Asha Iyer,et al. Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[13] S. Ullman. Visual routines , 1984, Cognition.

[14] Geoffrey E. Hinton,et al. Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[15] Antonio Torralba,et al. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[16] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17] Liqing Zhang,et al. Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[18] Erik D. Reichle,et al. The E-Z Reader model of eye-movement control in reading: Comparisons to other models , 2003, Behavioral and Brain Sciences.

[19] Dana H. Ballard,et al. Eye Movements for Reward Maximization , 2003, NIPS.

[20] Laurent Itti,et al. Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21] M. Hayhoe,et al. Look-ahead fixations: anticipatory eye movements in natural tasks , 2007, Experimental Brain Research.

[22] P. Perona,et al. Objects predict fixations better than early saliency. , 2008, Journal of vision.

[23] David N. Lee,et al. Where we look when we steer , 1994, Nature.

[24] Pietro Perona,et al. Graph-Based Visual Saliency , 2006, NIPS.

[25] O. Schwartz,et al. Visuomotor characterization of eye movements in a drawing task , 2009, Vision Research.

[26] Frédo Durand,et al. Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27] Christopher M. Brown,et al. Control of selective perception using bayes nets and decision theory , 1994, International Journal of Computer Vision.

[28] Jitendra Malik,et al. An Information Maximization Model of Eye Movements , 2004, NIPS.

[29] John K. Tsotsos,et al. Saliency Based on Information Maximization , 2005, NIPS.

[30] Laurent Itti,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[31] Javier R. Movellan,et al. Optimal scanning for faster object detection , 2009, CVPR.

[32] D. Ballard,et al. Memory Representations in Natural Tasks , 1995, Journal of Cognitive Neuroscience.

[33] Thomas S. Huang,et al. Image processing , 1971 .

[34] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .

[35] Antón García-Díaz,et al. Decorrelation and Distinctiveness Provide with Human-Like Saliency , 2009, ACIVS.

[36] William D. Smart,et al. A POMDP Model of Eye-Hand Coordination , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[37] Robert A. Marino,et al. Free viewing of dynamic stimuli by humans and monkeys. , 2009, Journal of vision.

[38] H. Basford,et al. Optimal eye movement strategies in visual search , 2005 .