Semantically-Based Human Scanpath Estimation with HMMs

We present a method for estimating human scan paths, which are sequences of gaze shifts that follow visual attention over an image. In this work, scan paths are modeled based on three principal factors that influence human attention, namely low-level feature saliency, spatial position, and semantic content. Low-level feature saliency is formulated as transition probabilities between different image regions based on feature differences. The effect of spatial position on gaze shifts is modeled as a Levy flight with the shifts following a 2D Cauchy distribution. To account for semantic content, we propose to use a Hidden Markov Model (HMM) with a Bag-of-Visual-Words descriptor of image regions. An HMM is well-suited for this purpose in that 1) the hidden states, obtained by unsupervised learning, can represent latent semantic concepts, 2) the prior distribution of the hidden states describes visual attraction to the semantic concepts, and 3) the transition probabilities represent human gaze shift patterns. The proposed method is applied to task-driven viewing processes. Experiments and analysis performed on human eye gaze data verify the effectiveness of this method.

[1]  Tai Sing Lee,et al.  An Information-Theoretic Framework for Understanding Saccadic Eye Movements , 1999, NIPS.

[2]  Ali Borji,et al.  An Object-Based Bayesian Framework for Top-Down Visual Attention , 2012, AAAI.

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  Ali Borji,et al.  Probabilistic learning of task-specific visual attention , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[7]  Wen Gao,et al.  Measuring visual saliency by Site Entropy Rate , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Jitendra Malik,et al.  An Information Maximization Model of Eye Movements , 2004, NIPS.

[9]  Ming-Hsuan Yang,et al.  Top-down visual saliency via joint CRF and dictionary learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Rama Chellappa,et al.  Entropy rate superpixel segmentation , 2011, CVPR 2011.

[11]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[13]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[14]  Giuseppe Boccignone,et al.  Modelling gaze shift as a constrained random walk , 2004 .

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[17]  Rongrong Ji,et al.  What are we looking for: Towards statistical modeling of saccadic eye movements and visual saliency , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yuan Yao,et al.  Simulating human saccadic scanpaths on natural images , 2011, CVPR 2011.

[19]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[20]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  T. Geisel,et al.  Are human scanpaths Levy flights , 1999 .

[23]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[24]  Deepu Rajan,et al.  Random Walks on Graphs for Salient Object Detection in Images , 2010, IEEE Transactions on Image Processing.

[25]  Shijian Lu,et al.  Saliency Modeling from Image Histograms , 2012, ECCV.

[26]  Preeti Verghese,et al.  Where to look next? Eye movements reduce local uncertainty. , 2007, Journal of vision.

[27]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Alex D. Hwang,et al.  Semantic guidance of eye movements in real-world scenes , 2011, Vision Research.

[29]  Christopher M. Brown,et al.  Controlling eye movements with hidden Markov models , 2004, International Journal of Computer Vision.

[30]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[31]  Kunio Kashino,et al.  A stochastic model of selective visual attention with a dynamic Bayesian network , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[32]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[33]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.