Markov chain based computational visual attention model that learns from eye tracking data

We use Markov chain to model the visual attention.Our visual attention model is based on low level and high level image features.We use the real eye tracking data to train our visual attention model.We measure performances of attention models by comparing them with human fixations.Our model is more consistency with the attentional deployment of humans. Computational visual attention models are a topic of increasing importance in computer understanding of images. Most existing attention models are based on bottom-up computation that often does not match actual human attention. To address this problem, we propose a novel visual attention model that is learned from actual eye tracking data. We use a Markov chain to model the relationship between the image feature and the saliency, then train a support vector regression (SVR) from true eye tracking data to predict the transition probabilities of the Markov chain. Finally, a saliency map predicting user's attention is obtained from the stationary distribution of this chain. Our experimental evaluations on several benchmark datasets demonstrate that the results of the proposed approach are comparable with or outperform the state-of-art models on prediction of human eye fixations and interest region detection.

[1]  Ingrid Heynderickx,et al.  Studying the added value of visual attention in objective image quality metrics based on eye movement data , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[2]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, CVPR 2004.

[3]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.

[4]  Bruce A. Draper,et al.  Evaluation of selective attention under similarity transformations , 2005, Comput. Vis. Image Underst..

[5]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[6]  Claude Brezinski,et al.  The PageRank Vector: Properties, Computation, Approximation, and Acceleration , 2006, SIAM J. Matrix Anal. Appl..

[7]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[8]  Claudio M. Privitera,et al.  Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[10]  Bernhard Schölkopf,et al.  A Nonparametric Approach to Bottom-Up Visual Saliency , 2006, NIPS.

[11]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Pietro Perona,et al.  Overcomplete steerable pyramid filters and rotation invariance , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[14]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[15]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[16]  C. Koch,et al.  Faces and text attract gaze independent of the task: Experimental data and computer model. , 2009, Journal of vision.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Zhou Wang,et al.  Foveation scalable video coding with automatic fixation selection , 2003, IEEE Trans. Image Process..

[19]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Zheru Chi,et al.  Attention-driven image interpretation with application to image retrieval , 2006, Pattern Recognit..

[22]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[23]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[24]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[25]  Zheru Chi,et al.  Refining a region based attention model using eye tracking data , 2010, 2010 IEEE International Conference on Image Processing.

[26]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, SIGGRAPH 2008.

[27]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[28]  Giuseppe Boccignone,et al.  Modelling gaze shift as a constrained random walk , 2004 .

[29]  Qi Zhao,et al.  Learning saliency-based visual attention: A review , 2013, Signal Process..

[30]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.