Predicting When Saliency Maps are Accurate and Eye Fixations Consistent

Many computational models of visual attention use image features and machine learning techniques to predict eye fixation locations as saliency maps. Recently, the success of Deep Convolutional Neural Networks (DCNNs) for object recognition has opened a new avenue for computational models of visual attention due to the tight link between visual attention and object recognition. In this paper, we show that using features from DCNNs for object recognition we can make predictions that enrich the information provided by saliency models. Namely, we can estimate the reliability of a saliency model from the raw image, which serves as a meta-saliency measure that may be used to select the best saliency algorithm for an image. Analogously, the consistency of the eye fixations among subjects, i.e. the agreement between the eye fixation locations of different subjects, can also be predicted and used by a designer to assess whether subjects reach a consensus about salient image locations.

[1]  Stan Sclaroff,et al.  Saliency Detection: A Boolean Map Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[3]  Ralf Engbert,et al.  Tracking the mind during reading: the influence of past, present, and future words on fixation durations. , 2006, Journal of experimental psychology. General.

[4]  Jianxiong Xiao,et al.  What makes an image memorable , 2011 .

[5]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Julie E. Boland,et al.  Cultural variation in eye movements during scene perception. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Luc Van Gool,et al.  Learning to Predict Sequences of Human Visual Fixations , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[12]  John K. Tsotsos,et al.  Attention based on information maximization , 2010 .

[13]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[14]  TorralbaAntonio,et al.  Modeling the Shape of the Scene , 2001 .

[15]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[16]  F. Volkmar,et al.  Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. , 2002, Archives of general psychiatry.

[17]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[19]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[20]  Víctor Leborán,et al.  On the relationship between optical variability, visual saliency, and eye fixations: a computational approach. , 2012, Journal of vision.

[21]  Bernhard Schölkopf,et al.  A Nonparametric Approach to Bottom-Up Visual Saliency , 2006, NIPS.

[22]  Cristian Sminchisescu,et al.  Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths , 2013, NIPS.

[23]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[24]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[25]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[28]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[30]  Aline Roumy,et al.  Prediction of the inter-observer visual congruency (IOVC) and application to image ranking , 2011, ACM Multimedia.

[31]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[32]  Leslie G. Ungerleider,et al.  Mechanisms of visual attention in the human cortex. , 2000, Annual review of neuroscience.

[33]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[34]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  J. DiCarlo,et al.  A High-throughput Screening Approach to Discovering Good Forms of Biologically-inspired Visual Representation. Text S2: Technical Details of the Computational Framework , 2009 .

[36]  Qi Zhao,et al.  SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2019, Computational Visual Media.

[38]  Kim M. Dalton,et al.  Gaze fixation and the neural circuitry of face processing in autism , 2005, Nature Neuroscience.

[39]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.