Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study

Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as “visual saliency.” Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling, eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.

[1]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[2]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[3]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[4]  David N. Lee,et al.  Where we look when we steer , 1994, Nature.

[5]  D. Ballard,et al.  Memory Representations in Natural Tasks , 1995, Journal of Cognitive Neuroscience.

[6]  L. Stark,et al.  Spontaneous Eye Movements During Visual Imagery Reflect the Content of the Visual Scene , 1997, Journal of Cognitive Neuroscience.

[7]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  J. Henderson,et al.  High-level scene perception. , 1999, Annual review of psychology.

[9]  P Reinagel,et al.  Natural scene statistics at the centre of gaze. , 1999, Network.

[10]  B. Jähne,et al.  Handbook of Computer Vision and Applications: Volume 1: From Scenes to Images , 1999 .

[11]  Brian Scassellati,et al.  A Context-Dependent Attention System for a Social Robot , 1999, IJCAI.

[12]  C. Chabris,et al.  Gorillas in Our Midst: Sustained Inattentional Blindness for Dynamic Events , 1999, Perception.

[13]  Claudio M. Privitera,et al.  Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[15]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[16]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[17]  Zhaoping Li A saliency map in primary visual cortex , 2002, Trends in Cognitive Sciences.

[18]  Albert Ali Salah,et al.  A Selective Attention-Based Method for Visual Pattern Recognition with Application to Handwritten Digit Recognition and Face Recognition , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Douglas DeCarlo,et al.  Stylization and abstraction of photographs , 2002, ACM Trans. Graph..

[20]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[21]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[22]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[23]  Laurent Itti,et al.  Realistic avatar eye and head animation using a neurobiological model of visual attention , 2004, SPIE Optics + Photonics.

[24]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[25]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[26]  Gunther Heidemann,et al.  Focus-of-attention from local color symmetries , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[28]  David J. Field,et al.  How Close Are We to Understanding V1? , 2005, Neural Computation.

[29]  Yehezkel Yeshurun,et al.  Context-free attentional operators: The generalized symmetry transform , 1995, International Journal of Computer Vision.

[30]  H. Nothdurft Salience of Feature Contrast , 2005 .

[31]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[32]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[33]  Heinz Hügli,et al.  Assessing the contribution of color in visual attention , 2005, Comput. Vis. Image Underst..

[34]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[35]  G. Humphreys,et al.  Computational models of visual selective attention: A review , 2005 .

[36]  Simone Frintrop,et al.  VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search , 2006, Lecture Notes in Computer Science.

[37]  A. Bovik,et al.  Visual search in noise: revealing the influence of structural cues by gaze-contingent classification image analysis. , 2006, Journal of vision.

[38]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[39]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  M. Hayhoe,et al.  Look-ahead fixations: anticipatory eye movements in natural tasks , 2007, Experimental Brain Research.

[41]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[43]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[44]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[45]  Nuno Vasconcelos,et al.  The discriminant center-surround hypothesis for bottom-up saliency , 2007, NIPS.

[46]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[49]  Bo Han,et al.  High Speed Visual Saliency Computation on GPU , 2007, 2007 IEEE International Conference on Image Processing.

[50]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[51]  Liming Zhang,et al.  Biological Plausibility of Spectral Domain Approach for Spatiotemporal Visual Saliency , 2008, ICONIP.

[52]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[53]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[54]  Athanasios V. Vasilakos,et al.  Dynamic Intelligent Lighting for Directing Visual Attention in Interactive 3-D Scenes , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[55]  Bernhard Schölkopf,et al.  Center-surround patterns emerge as optimal predictors for human saccade targets. , 2009, Journal of vision.

[56]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[57]  C. Koch,et al.  Faces and text attract gaze independent of the task: Experimental data and computer model. , 2009, Journal of vision.

[58]  Majid Nili Ahmadabadi,et al.  Cost-sensitive learning of top-down modulation for attentional control , 2009, Machine Vision and Applications.

[59]  Yin Li,et al.  Visual Saliency Based on Conditional Entropy , 2009, ACCV.

[60]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[61]  Majid Nili Ahmadabadi,et al.  Fast Hand gesture recognition based on saliency maps: An application to interactive robotic marionette playing , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[62]  Matthew H Tong,et al.  of the Annual Meeting of the Cognitive Science Society Title SUNDAy : Saliency Using Natural Statistics for Dynamic Analysis of Scenes Permalink , 2009 .

[63]  Harry Shum,et al.  Picture Collage , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[64]  Christopher M. Masciocchi,et al.  Everyone knows what is interesting: salient locations which should be fixated. , 2009, Journal of vision.

[65]  Laurent Itti,et al.  Biologically Inspired Mobile Robot Vision Localization , 2009, IEEE Transactions on Robotics.

[66]  Dattaguru V Kamat A framework for visual saliency detection with applications to image thumbnailing , 2009 .

[67]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[68]  Nathalie Guyader,et al.  Modelling Spatio-Temporal Saliency to Predict Gaze Direction for Short Videos , 2009, International Journal of Computer Vision.

[69]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[70]  Tingting Xu,et al.  A high-speed multi-GPU implementation of bottom-up attention using CUDA , 2009, 2009 IEEE International Conference on Robotics and Automation.

[71]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[72]  Lambert Schomaker,et al.  Prediction of human eye fixations using symmetry , 2009 .

[73]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[74]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[75]  Michael Lindenbaum,et al.  Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Laurent Itti,et al.  A Bayesian model for efficient visual search and recognition , 2010, Vision Research.

[77]  Majid Nili Ahmadabadi,et al.  Online learning of task-driven object-based visual attention control , 2010, Image Vis. Comput..

[78]  Jian Liu,et al.  Visual saliency detection via rank-sparsity decomposition , 2010, 2010 IEEE International Conference on Image Processing.

[79]  Simone Frintrop,et al.  General object tracking with a component-based target descriptor , 2010, 2010 IEEE International Conference on Robotics and Automation.

[80]  N. Vasconcelos,et al.  Biologically plausible saliency mechanisms improve feedforward object recognition , 2010, Vision Research.

[81]  N Parikh,et al.  Saliency-based image processing for retinal prostheses , 2010, Journal of neural engineering.

[82]  Wen Gao,et al.  Probabilistic Multi-Task Learning for Visual Saliency Estimation in Video , 2010, International Journal of Computer Vision.

[83]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[84]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[85]  Esa Rahtu,et al.  Fast and Efficient Saliency Detection Using Sparse Sampling and Kernel Density Estimation , 2011, SCIA.

[86]  Naila Murray,et al.  Saliency estimation using a non-parametric low-level vision model , 2011, CVPR 2011.

[87]  A. Torralba,et al.  Fixations on low-resolution images. , 2010, Journal of vision.

[88]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[89]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[90]  Yuan Yao,et al.  Simulating human saccadic scanpaths on natural images , 2011, CVPR 2011.

[91]  Ruth Rosenholtz,et al.  Do predictions of visual perception aid design? , 2011, TAP.

[92]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Antón García-Díaz,et al.  Saliency from hierarchical adaptation through decorrelation and variance normalization , 2012, Image Vis. Comput..

[94]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.