Learning to Predict Sequences of Human Visual Fixations

Most state-of-the-art visual attention models estimate the probability distribution of fixating the eyes in a location of the image, the so-called saliency maps. Yet, these models do not predict the temporal sequence of eye fixations, which may be valuable for better predicting the human eye fixations, as well as for understanding the role of the different cues during visual exploration. In this paper, we present a method for predicting the sequence of human eye fixations, which is learned from the recorded human eye-tracking data. We use least-squares policy iteration (LSPI) to learn a visual exploration policy that mimics the recorded eye-fixation examples. The model uses a different set of parameters for the different stages of visual exploration that capture the importance of the cues during the scanpath. In a series of experiments, we demonstrate the effectiveness of using LSPI for combining multiple cues at different stages of the scanpath. The learned parameters suggest that the low-level and high-level cues (semantics) are similarly important at the first eye fixation of the scanpath, and the contribution of high-level cues keeps increasing during the visual exploration. Results show that our approach obtains the state-of-the-art performances on two challenging data sets: 1) OSIE data set and 2) MIT data set.

[1]  Leslie G. Ungerleider,et al.  Mechanisms of visual attention in the human cortex. , 2000, Annual review of neuroscience.

[2]  G. Kanizsa,et al.  Organization in Vision: Essays on Gestalt Perception , 1979 .

[3]  Víctor Leborán,et al.  On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. , 2012, Journal of vision.

[4]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[5]  Sridhar Mahadevan,et al.  A reinforcement learning model of selective visual attention , 2001, AGENTS '01.

[6]  C. Koch,et al.  Faces and text attract gaze independent of the task: Experimental data and computer model. , 2009, Journal of vision.

[7]  Itamar Arel,et al.  Reinforcement learning based visual attention with application to face detection , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Lucas Paletta,et al.  Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint , 2008, Lecture Notes in Computer Science.

[9]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[10]  Junwei Han,et al.  Saliency detection by combining spatial and spectral information. , 2013, Optics letters.

[11]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[12]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[13]  Jeffrey S. Maxwell,et al.  Human Amygdala Responsivity to Masked Fearful Eye Whites , 2004, Science.

[14]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[15]  C. Kennard,et al.  The role of visual salience in directing eye movements in visual object agnosia , 2009, Current Biology.

[16]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[17]  Ling Shao,et al.  Video abstraction based on fMRI-driven visual attention model , 2014, Inf. Sci..

[18]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Rongrong Ji,et al.  What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Lei Guo,et al.  An Object-Oriented Visual Saliency Detection Framework Based on Sparse Coding Representations , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Luc Van Gool,et al.  SEEDS: Superpixels Extracted Via Energy-Driven Sampling , 2012, International Journal of Computer Vision.

[22]  Stephen Lin,et al.  Semantically-Based Human Scanpath Estimation with HMMs , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  J. Haxby,et al.  fMRI Responses to Video and Point-Light Displays of Moving Humans and Manipulable Objects , 2003, Journal of Cognitive Neuroscience.

[24]  Dana H. Ballard,et al.  Eye Movements for Reward Maximization , 2003, NIPS.

[25]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[26]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[28]  Yao Lu,et al.  Salient Object Detection using concavity context , 2011, 2011 International Conference on Computer Vision.

[29]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[31]  Jitendra Malik,et al.  An Information Maximization Model of Eye Movements , 2004, NIPS.

[32]  King Ngi Ngan,et al.  Saliency detection using joint spatial-color constraint and multi-scale segmentation , 2013, J. Vis. Commun. Image Represent..

[33]  Ling Shao,et al.  Spatial and temporal visual attention prediction in videos using eye movement data , 2014, Neurocomputing.

[34]  Yuan Yao,et al.  Simulating human saccadic scanpaths on natural images , 2011, CVPR 2011.

[35]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[36]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[37]  Lucas Paletta,et al.  Reinforcement Learning for Decision Making in Sequential Visual Attention , 2007, WAPCV.

[38]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[39]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[40]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[41]  Cristian Sminchisescu,et al.  Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths , 2013, NIPS.

[42]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[44]  Tai Sing Lee,et al.  An Information-Theoretic Framework for Understanding Saccadic Eye Movements , 1999, NIPS.

[45]  Majid Nili Ahmadabadi,et al.  Learning sequential visual attention control through dynamic state space discretization , 2009, 2009 IEEE International Conference on Robotics and Automation.

[46]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[47]  Nicu Sebe,et al.  Image saliency by isocentric curvedness and color , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[48]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[49]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  R. Bellman A Markovian Decision Process , 1957 .

[51]  D. Ballard,et al.  The role of uncertainty and reward on eye movements in a virtual driving task. , 2012, Journal of vision.

[52]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[53]  Dieter Fox,et al.  Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[54]  Alex Pentland,et al.  Active Gesture Recognition using Learned Visual Attention , 1995, NIPS.

[55]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[56]  Karl J. Friston,et al.  Value-dependent selection in the brain: Simulation in a synthetic neural model , 1994, Neuroscience.