Information-theoretic model comparison unifies saliency metrics

Significance Where do people look in images? Predicting eye movements from images is an active field of study, with more than 50 quantitative prediction models competing to explain scene viewing behavior. Yet the rules for this competition are unclear. Using a principled metric for model comparison (information gain), we quantify progress in the field and show how formulating the models probabilistically resolves discrepancies in other metrics. We have also developed model assessment tools to reveal where models fail on the database, image, and pixel levels. These tools will facilitate future advances in saliency modeling and are made freely available in an open source software framework (www.bethgelab.org/code/pysaliency). Learning the properties of an image associated with human gaze placement is important both for understanding how biological systems explore the environment and for computer vision applications. There is a large literature on quantitative eye movement models that seeks to predict fixations from images (sometimes termed “saliency” prediction). A major problem known to the field is that existing model comparison metrics give inconsistent results, causing confusion. We argue that the primary reason for these inconsistencies is because different metrics and models use different definitions of what a “saliency map” entails. For example, some metrics expect a model to account for image-independent central fixation bias whereas others will penalize a model that does. Here we bring saliency evaluation into the domain of information by framing fixation prediction models probabilistically and calculating information gain. We jointly optimize the scale, the center bias, and spatial blurring of all models within this framework. Evaluating existing metrics on these rephrased models produces almost perfect agreement in model rankings across the metrics. Model performance is separated from center bias and spatial blurring, avoiding the confounding of these factors in model comparison. We additionally provide a method to show where and how models fail to capture information in the fixations on the pixel level. These methods are readily extended to spatiotemporal models of fixation scanpaths, and we provide a software package to facilitate their use.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[3]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[4]  Alan C. Bovik,et al.  Point-of-gaze analysis reveals visual search strategies , 2004, IS&T/SPIE Electronic Imaging.

[5]  Wilson S. Geisler,et al.  Optimal eye movement strategies in visual search , 2005, Nature.

[6]  L. Itti Author address: , 1999 .

[7]  Pierre Baldi,et al.  A principled approach to detecting surprising events in video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  A. Guo Natural Computation: Decision-Making Facing Conflicting Visual Cues and Crossmodal Interaction Between Olfactoryand Visual Learning in Drosophila , 2005, 2005 International Conference on Neural Networks and Brain.

[9]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[10]  Bernhard Schölkopf,et al.  A Nonparametric Approach to Bottom-Up Visual Saliency , 2006, NIPS.

[11]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[12]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[13]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[14]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[15]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Benjamin W. Tatler,et al.  Systematic tendencies in scene viewing , 2008 .

[17]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[18]  K. Shapiro,et al.  The contingent negative variation (CNV) event-related potential (ERP) predicts the attentional blink , 2008 .

[19]  B. Tatler,et al.  The prominence of behavioural biases in eye guidance , 2009 .

[20]  Bernhard Schölkopf,et al.  Center-surround patterns emerge as optimal predictors for human saccade targets. , 2009, Journal of vision.

[21]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[22]  Wilson S. Geisler,et al.  Simple summation rule for optimal fixation selection in visual search , 2009, Vision Research.

[23]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[24]  R. Baddeley,et al.  Do we look at lights? Using mixture modelling to distinguish between low- and high-level factors in natural image viewing , 2009 .

[25]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[26]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[28]  Alvin Raj,et al.  Beyond texture processing: further implications of statistical representations , 2010 .

[29]  Thomas Martinetz,et al.  Variability of eye movements when viewing dynamic natural scenes. , 2010, Journal of vision.

[30]  Lihi Zelnik-Manor,et al.  Context-aware saliency detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Pierre Baldi,et al.  Of bits and wows: A Bayesian theory of surprise with applications to attention , 2010, Neural Networks.

[32]  Wen Gao,et al.  Measuring visual saliency by Site Entropy Rate , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[34]  S. Shevell,et al.  Multistable binocular feature-integrated percepts are frozen by intermittent presentation. , 2011, Journal of vision.

[35]  Peter König,et al.  Measures and Limits of Models of Fixation Selection , 2011, PloS one.

[36]  Laurence T. Maloney,et al.  Human Visual Search Does Not Maximize the Post-Saccadic Probability of Identifying Targets , 2012, PLoS Comput. Biol..

[37]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[38]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Simon Barthelmé,et al.  Modeling fixation locations using spatial point processes. , 2012, Journal of vision.

[40]  Stan Sclaroff,et al.  Saliency Detection: A Boolean Map Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[41]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Nicolas Riche,et al.  RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis , 2013, Signal Process. Image Commun..

[43]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[44]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[45]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[47]  Tandra Ghose,et al.  Generalization between canonical and non-canonical views in object recognition. , 2013, Journal of vision.

[48]  Michael Dorr,et al.  Large-Scale Optimization of Hierarchical Features for Saliency Prediction in Natural Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  John K. Tsotsos,et al.  On computational modeling of visual saliency: Examining what’s right, and what’s left , 2015, Vision Research.

[50]  Matthias Bethge,et al.  Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet , 2014, ICLR.

[51]  Simon Barthelmé,et al.  Spatial statistics and attentional dynamics in scene viewing. , 2014, Journal of vision.

[52]  James Glass,et al.  MIT Computer Science and Artificial Intelligence Laboratory , 2015 .

[53]  Stan Sclaroff,et al.  Exploiting Surroundedness for Saliency Detection: A Boolean Map Approach , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.