What has been missed for predicting human attention in viewing driving clips?

Recent research progress on the topic of human visual attention allocation in scene perception and its simulation is based mainly on studies with static images. However, natural vision requires us to extract visual information that constantly changes due to egocentric movements or dynamics of the world. It is unclear to what extent spatio-temporal regularity, an inherent regularity in dynamic vision, affects human gaze distribution and saliency computation in visual attention models. In this free-viewing eye-tracking study we manipulated the spatio-temporal regularity of traffic videos by presenting them in normal video sequence, reversed video sequence, normal frame sequence, and randomised frame sequence. The recorded human gaze allocation was then used as the ‘ground truth’ to examine the predictive ability of a number of state-of-the-art visual attention models. The analysis revealed high inter-observer agreement across individual human observers, but all the tested attention models performed significantly worse than humans. The inferior predictability of the models was evident from indistinguishable gaze prediction irrespective of stimuli presentation sequence, and weak central fixation bias. Our findings suggest that a realistic visual attention model for the processing of dynamic scenes should incorporate human visual sensitivity with spatio-temporal regularity and central fixation bias.

[1]  Tao Mei,et al.  Author Topic Model-Based Collaborative Filtering for Personalized POI Recommendations , 2015, IEEE Transactions on Multimedia.

[2]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[3]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[4]  Fabrizio Gabbiani,et al.  Spike-frequency adaptation and intrinsic properties of an identified, looming-sensitive neuron. , 2006, Journal of neurophysiology.

[5]  F. Claire Rind,et al.  A DIRECTIONALLY SELECTIVE MOTION-DETECTING NEURONE IN THE BRAIN OF THE LOCUST: PHYSIOLOGICAL AND MORPHOLOGICAL CHARACTERIZATION , 1990 .

[6]  Nuno Vasconcelos,et al.  Spatiotemporal Saliency in Dynamic Scenes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Teemu H Itkonen,et al.  Beyond the tangent point: gaze targets in naturalistic driving. , 2013, Journal of vision.

[8]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[9]  A. Johnston,et al.  Categorizing sex and identity from the biological motion of faces , 2001, Current Biology.

[10]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[11]  Kan Zhang,et al.  Decomposing the spatiotemporal signature in dynamic 3D object recognition. , 2010, Journal of vision.

[12]  P Reinagel,et al.  Natural scene statistics at the centre of gaze. , 1999, Network.

[13]  C. Wallraven,et al.  Dynamic information for the recognition of conversational expressions. , 2009, Journal of vision.

[14]  K. Zou,et al.  Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models , 2007, Circulation.

[15]  J. Henderson Regarding Scenes , 2007 .

[16]  Philippe Lefèvre,et al.  Smooth pursuit performance during target blanking does not influence the triggering of predictive saccades. , 2009, Journal of vision.

[17]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  M. Tarr,et al.  Rotation direction affects object recognition , 2004, Vision Research.

[19]  K. Guo,et al.  Role of lateral and feedback connections in primary visual cortex in the processing of spatiotemporal regularity − A TMS study , 2014, Neuroscience.

[20]  Nathalie Guyader,et al.  Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos , 2009, International Journal of Computer Vision.

[21]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[22]  Markus Lappe,et al.  Car drivers attend to different gaze targets when negotiating closed vs. open bends. , 2010, Journal of vision.

[23]  Kun Guo,et al.  Consistent left gaze bias in processing different facial cues , 2012, Psychological research.

[24]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[25]  Sophie Hall,et al.  Exploitation of natural geometrical regularities facilitates target detection , 2010, Vision Research.

[26]  Thomas Martinetz,et al.  Variability of eye movements when viewing dynamic natural scenes. , 2010, Journal of vision.

[27]  D. S. Wooding,et al.  The relationship between the locations of spatial features and those of fixations made during visual examination of briefly presented images. , 1996, Spatial vision.

[28]  Tim C Kietzmann,et al.  Investigating task-dependent top-down effects on overt visual attention. , 2010, Journal of vision.

[29]  Vicki Bruce,et al.  Recognizing Famous Faces: Exploring the Benefits of Facial Motion , 2000 .

[30]  M. Young,et al.  Longer fixation duration while viewing face images , 2006, Experimental Brain Research.

[31]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[32]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[33]  Jiawei Xu,et al.  Mimicking visual searching with integrated top down cues and low-level features , 2014, Neurocomputing.

[34]  Jiawei Xu,et al.  A Motion Attention Model Based on Rarity Weighting and Motion cues in Dynamic Scenes , 2013, Int. J. Pattern Recognit. Artif. Intell..

[35]  L. Itti,et al.  Visual causes versus correlates of attentional selection in dynamic scenes , 2006, Vision Research.

[36]  John K. Tsotsos,et al.  Attention links sensing to recognition , 2008, Image Vis. Comput..

[37]  Shigang Yue,et al.  Collision detection in complex dynamic scenes using an LGMD-based visual neural network with feature enhancement , 2006, IEEE Transactions on Neural Networks.

[38]  Shigang Yue,et al.  Redundant Neural Vision Systems—Competing for Collision Recognition Roles , 2013, IEEE Transactions on Autonomous Mental Development.

[39]  B. Anderson A value-driven mechanism of attentional selection. , 2013, Journal of vision.

[40]  Minho Lee,et al.  Dynamic visual selective attention model , 2008, Neurocomputing.

[41]  P. König,et al.  Effects of luminance contrast and its modifications on fixation behavior during free viewing of images from different categories , 2009, Vision Research.

[42]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[43]  I. The,et al.  A DIRECTIONALLY SELECTIVE MOTION-DETECTING NEURONE IN THE BRAIN OF THE LOCUST : PHYSIOLOGICAL AND MORPHOLOGICAL CHARACTERIZATION , 2005 .

[44]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[45]  Alexander Thiele,et al.  Effects on orientation perception of manipulating the spatio–temporal prior probability of stimuli , 2004, Vision Research.

[46]  K. Guo,et al.  Event-related potential correlates of the interaction between attention and spatiotemporal context regularity in vision , 2011, Neuroscience.

[47]  Sophie Hall,et al.  Facial Expression Training Optimises Viewing Strategy in Children and Adults , 2014, PloS one.

[48]  Matthew H Tong,et al.  SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[49]  Derrick J. Parkhurst,et al.  Scene content selected by active vision. , 2003, Spatial vision.

[50]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[52]  Ali Borji,et al.  What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[53]  Ashish D. Nimbarte,et al.  Effect of driving experience on visual behavior and driving performance under different driving conditions , 2012, Cognition, Technology & Work.

[54]  C. Koch,et al.  The relation of phase noise and luminance contrast to overt attention in complex visual stimuli. , 2006, Journal of vision.

[55]  F. Rind,et al.  Neural network based on the input organization of an identified neuron signaling impending collision. , 1996, Journal of neurophysiology.

[56]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[57]  Petra M.J. Pollux,et al.  Event-related potential correlates of spatiotemporal regularities in vision , 2009, Neuroreport.

[58]  K. Guo,et al.  How does image noise affect actual and predicted human gaze allocation in assessing image quality? , 2015, Vision Research.

[59]  Robert A. Marino,et al.  Free viewing of dynamic stimuli by humans and monkeys. , 2009, Journal of vision.

[60]  Kun Guo,et al.  Dog owners show experience-based viewing behaviour in judging dog face approachability , 2015, Psychological Research.