A Benchmark of Computational Models of Saliency to Predict Human Fixations

Many computational models of visual attention have been created from a wide variety of different approaches to predict where people look in images. Each model is usually introduced by demonstrating performances on new images, and it is hard to make immediate comparisons between models. To alleviate this problem, we propose a benchmark data set containing 300 natural images with eye tracking data from 39 observers to compare model performances. We calculate the performance of 10 models at predicting ground truth fixations using three different metrics. We provide a way for people to submit new models for evaluation online. We find that the Judd et al. and Graph-based visual saliency models perform best. In general, models with blurrier maps and models that include a center bias perform well. We add and optimize a blur and center bias for each model and show improvements. We compare performances to baseline models of chance, center and human performance. We show that human performance increases with the number of humans to a limit. We analyze the similarity of different models using multidimensional scaling and explore the relationship between model performance and fixation consistency. Finally, we offer observations about how to improve saliency models in the future.

[1]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[2]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[3]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[4]  M. Posner,et al.  Components of visual orienting , 1984 .

[5]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[6]  James J. Clark,et al.  Modal Control Of An Attentive Vision System , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[7]  Susan L. Franzel,et al.  Guided search: an alternative to the feature integration model for visual search. , 1989, Journal of experimental psychology. Human perception and performance.

[8]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[9]  Lawrence W. Stark,et al.  Visual perception and sequences of eye movement fixations: a stochastic modeling approach , 1992, IEEE Trans. Syst. Man Cybern..

[10]  John K. Tsotsos An inhibitory beam for attentional selection , 1994 .

[11]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[12]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[13]  L W Stark,et al.  String editing analysis of human visual search. , 1995, Optometry and vision science : official publication of the American Academy of Optometry.

[14]  J. Rieser,et al.  Attention and communication: Eye-movement-based research paradigms , 1996 .

[15]  D. S. Wooding,et al.  Fixation Patterns Made during Brief Examination of Two-Dimensional Images , 1997, Perception.

[16]  L. Stark,et al.  Spontaneous Eye Movements During Visual Imagery Reflect the Content of the Visual Scene , 1997, Journal of Cognitive Neuroscience.

[17]  Claudio M. Privitera,et al.  Algorithms for Defining Visual Regions-of-Interest: Comparison with Eye Fixations , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[19]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[20]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[21]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[22]  Antonio Torralba,et al.  Top-down control of visual attention in object detection , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[23]  Derrick J. Parkhurst,et al.  Scene content selected by active vision. , 2003, Spatial vision.

[24]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[25]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[26]  S. Shipp The brain circuitry of attention , 2004, Trends in Cognitive Sciences.

[27]  L. Itti,et al.  Modeling the influence of task on attention , 2005, Vision Research.

[28]  L. Itti Author address: , 1999 .

[29]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[30]  Bruce A. Draper,et al.  Evaluation of selective attention under similarity transformations , 2005, Comput. Vis. Image Underst..

[31]  John K. Tsotsos,et al.  Attending to visual motion , 2005, Comput. Vis. Image Underst..

[32]  Heinz Hügli,et al.  Assessing the contribution of color in visual attention , 2005, Comput. Vis. Image Underst..

[33]  L. Itti,et al.  A brief and selective history of attention , 2005 .

[34]  John K. Tsotsos,et al.  An attentional framework for stereo vision , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[35]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[36]  Nuno Vasconcelos,et al.  Integrated learning of saliency, complex features, and object detectors from cluttered scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  John K. Tsotsos,et al.  Selective Tuning: Feature Binding Through Selective Attention , 2006, ICANN.

[38]  Bernhard Schölkopf,et al.  A Nonparametric Approach to Bottom-Up Visual Saliency , 2006, NIPS.

[39]  Michael Lindenbaum,et al.  Attention-based dynamic visual search using inner-scene similarity: algorithms and bounds , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[41]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[42]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  John K. Tsotsos,et al.  Different Binding Strategies for the Different Stages of Visual Recognition , 2007, BVAI.

[45]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[46]  L. Itti,et al.  Search Goal Tunes Visual Features Optimally , 2007, Neuron.

[47]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Christof Koch,et al.  Predicting human gaze using low-level saliency combined with face detection , 2007, NIPS.

[49]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[50]  Michael L. Mack,et al.  VISUAL SALIENCY DOES NOT ACCOUNT FOR EYE MOVEMENTS DURING VISUAL SEARCH IN REAL-WORLD SCENES , 2007 .

[51]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Christof Koch,et al.  Using semantic content as cues for better scanpath prediction , 2008, ETRA.

[53]  Christof Koch,et al.  Decoding What People See from Where They Look: Predicting Visual Stimuli from Scanpaths , 2009, WAPCV.

[54]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.

[55]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[56]  John K. Tsotsos,et al.  Attention links sensing to recognition , 2008, Image Vis. Comput..

[57]  Sabine Süsstrunk,et al.  Salient Region Detection and Segmentation , 2008, ICVS.

[58]  Alan C. Bovik,et al.  GAFFE: A Gaze-Attentive Fixation Finding Engine , 2008, IEEE Transactions on Image Processing.

[59]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[60]  Baoxin Li,et al.  A two-stage approach to saliency detection in images , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  Michael Werman,et al.  A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.

[62]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[63]  B. Tatler,et al.  The prominence of behavioural biases in eye guidance , 2009 .

[64]  Nicu Sebe,et al.  Image saliency by isocentric curvedness and color , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[65]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[66]  C. Koch,et al.  Faces and text attract gaze independent of the task: Experimental data and computer model. , 2009, Journal of vision.

[67]  Umesh Rajashekar,et al.  DOVES: a database of visual eye movements. , 2009, Spatial vision.

[68]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[69]  Gabriela Csurka,et al.  A framework for visual saliency detection with applications to image thumbnailing , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[70]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[71]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[72]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[73]  Peyman Milanfar,et al.  Nonparametric bottom-up saliency detection by self-resemblance , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[74]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[75]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[76]  Matthew H Tong,et al.  SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[77]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[78]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[79]  Liming Zhang,et al.  A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression , 2010, IEEE Transactions on Image Processing.

[80]  Michael Lindenbaum,et al.  Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[82]  Laurent Itti,et al.  A Bayesian model for efficient visual search and recognition , 2010, Vision Research.

[83]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[84]  Harish Katti,et al.  An Eye Fixation Database for Saliency Detection in Images , 2010, ECCV.

[85]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.