What Do Different Evaluation Metrics Tell Us About Saliency Models?

How best to evaluate a saliency model's ability to predict where humans look in images is an open research question. The choice of evaluation metric depends on how saliency is defined and how the ground truth is represented. Metrics differ in how they rank saliency models, and this results from how false positives and false negatives are treated, whether viewing biases are accounted for, whether spatial deviations are factored in, and how the saliency maps are pre-processed. In this paper, we provide an analysis of 8 different evaluation metrics and their properties. With the help of systematic experiments and visualizations of metric computations, we add interpretability to saliency scores and more transparency to the evaluation of saliency models. Building off the differences in metric properties and behaviors, we make recommendations for metric selections under specific assumptions and for specific applications.

[1]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[2]  Yafei Song,et al.  A Data-Driven Metric for Comprehensive Evaluation of Saliency Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Jitendra Malik,et al.  An Information Maximization Model of Eye Movements , 2004, NIPS.

[5]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[6]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  John K. Tsotsos,et al.  On computational modeling of visual saliency: Examining what’s right, and what’s left , 2015, Vision Research.

[8]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[10]  Christof Koch,et al.  Learning a saliency map using fixated locations in natural scenes. , 2011, Journal of vision.

[11]  Ingrid Heynderickx,et al.  Comparative Study of Fixation Density Maps , 2013, IEEE Transactions on Image Processing.

[12]  Lihi Zelnik-Manor,et al.  Puzzle‐like Collage , 2010, Comput. Graph. Forum.

[13]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[14]  B. Tatler,et al.  Deriving an appropriate baseline for describing fixation behaviour , 2014, Vision Research.

[15]  Dattaguru V Kamat A framework for visual saliency detection with applications to image thumbnailing , 2009 .

[16]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, ACM Trans. Graph..

[17]  Wilson S. Geisler,et al.  Real-time foveated multiresolution system for low-bandwidth video communication , 1998, Electronic Imaging.

[18]  Nuno Vasconcelos,et al.  On the efficient evaluation of probabilistic similarity functions for image retrieval , 2004, IEEE Transactions on Information Theory.

[19]  Benjamin B. Bederson,et al.  Automatic thumbnail cropping and its effectiveness , 2003, UIST '03.

[20]  Kurt Debattista,et al.  A GPU based saliency map for high-fidelity selective rendering , 2006, AFRIGRAPH '06.

[21]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Joachim M. Buhmann,et al.  Non-parametric similarity measures for unsupervised texture segmentation and image retrieval , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[24]  S. Avidan,et al.  Seam carving for content-aware image resizing , 2007, SIGGRAPH 2007.

[25]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[26]  Harry Shum,et al.  Picture Collage , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Zhou Wang,et al.  Foveation scalable video coding with automatic fixation selection , 2003, IEEE Trans. Image Process..

[28]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[29]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[30]  Kien A. Hua,et al.  Image Retrieval Based on Regions of Interest , 2003, IEEE Trans. Knowl. Data Eng..

[31]  Yifan Peng,et al.  Studying Relationships between Human Gaze, Description, and Computer Vision , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[33]  Martin D. Levine,et al.  Visual Saliency Based on Scale-Space Analysis in the Frequency Domain , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[35]  Jan Flusser,et al.  Image registration methods: a survey , 2003, Image Vis. Comput..

[36]  N. Mackworth,et al.  The gaze selects informative details within pictures , 1967 .

[37]  Ying Liu,et al.  A survey of content-based image retrieval with high-level semantics , 2007, Pattern Recognit..

[38]  Nuno Vasconcelos,et al.  The discriminant center-surround hypothesis for bottom-up saliency , 2007, NIPS.

[39]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[40]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[41]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[43]  Simone Frintrop,et al.  General object tracking with a component-based target descriptor , 2010, 2010 IEEE International Conference on Robotics and Automation.

[44]  Sabine Süsstrunk,et al.  Saliency detection for content-aware image resizing , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[45]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[46]  John K. Tsotsos,et al.  Overt fixations reflect a natural central bias , 2013 .

[47]  M. Bindemann Scene and screen center bias early eye movements in scene viewing , 2010, Vision Research.

[48]  Calden Wloka,et al.  Spatially Binned ROC: A Comprehensive Saliency Metric , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[50]  Amitabh Varshney,et al.  Saliency-guided Enhancement for Volume Visualization , 2006, IEEE Transactions on Visualization and Computer Graphics.

[51]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[52]  TomasiCarlo,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000 .

[53]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[54]  K. K. Shukla,et al.  A Study of Distance Metrics in Histogram Based Image Retrieval , 2013, BIOINFORMATICS 2013.

[55]  Patrick Le Callet,et al.  A coherent computational approach to model bottom-up visual attention , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[57]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[58]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[59]  Derrick J. Parkhurst,et al.  Scene content selected by active vision. , 2003, Spatial vision.

[60]  David Salesin,et al.  Gaze-based interaction for semi-automatic photo cropping , 2006, CHI.

[61]  Lawrence L. Hoberock,et al.  Selection of a best metric and evaluation of bottom-up visual saliency models , 2013, Image Vis. Comput..

[62]  Peter König,et al.  Measures and Limits of Models of Fixation Selection , 2011, PloS one.

[63]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[64]  Matthias Bethge,et al.  How close are we to understanding image-based saliency? , 2014, ArXiv.

[65]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[66]  P. König,et al.  Does luminance‐contrast contribute to a saliency map for overt visual attention? , 2003, The European journal of neuroscience.

[67]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Jeff B. Pelz,et al.  High-level aspects of oculomotor control during viewing of natural-task images , 2003, IS&T/SPIE Electronic Imaging.

[69]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[70]  Michael Werman,et al.  A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.

[71]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[72]  Dong Wang,et al.  Saliency-driven scaling optimization for image retargeting , 2011, The Visual Computer.

[73]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[74]  A. Torralba,et al.  Fixations on low-resolution images. , 2010, Journal of vision.

[75]  Tilke Judd,et al.  Understanding and predicting where people look in images , 2011 .

[76]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.

[77]  Simone Frintrop,et al.  Center-surround divergence of feature statistics for salient object detection , 2011, 2011 International Conference on Computer Vision.

[78]  O. Meur,et al.  Predicting visual fixations on video based on low-level visual features , 2007, Vision Research.

[79]  Douglas DeCarlo,et al.  Stylization and abstraction of photographs , 2002, ACM Trans. Graph..

[80]  Matthias Bethge,et al.  Information-theoretic model comparison unifies saliency metrics , 2015, Proceedings of the National Academy of Sciences.

[81]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[82]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[83]  Jianbo Shi,et al.  Image Matching via Saliency Region Correspondences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  John K. Tsotsos,et al.  Towards the Quantitative Evaluation of Visual Attention Models Bottom−up Top-down Dynamic Static 0 0 0 , 2022 .

[85]  Henrik I. Christensen,et al.  Simultaneous Robot Localization and Mapping Based on a Visual Attention System , 2008, WAPCV.

[86]  Frédo Durand,et al.  Where Should Saliency Models Look Next? , 2016, ECCV.

[87]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[88]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[89]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[90]  L. Stark,et al.  Scanpaths in saccadic eye movements while viewing and recognizing patterns. , 1971, Vision research.

[91]  Sabine Süsstrunk,et al.  Salient Region Detection and Segmentation , 2008, ICVS.

[92]  Jordi Pont-Tuset,et al.  Supervised Evaluation of Image Segmentation and Object Proposal Techniques , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[94]  Laurent Itti,et al.  Mobile robot vision navigation & localization using Gist and Saliency , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[95]  Fernando López-García,et al.  Scene Recognition through Visual Attention and Image Features: A Comparison between SIFT and SURF Approaches , 2011 .

[96]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[97]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[98]  Alexander Toet,et al.  Computational versus Psychophysical Bottom-Up Image Saliency: A Comparative Evaluation Study , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[99]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[100]  Carlo Tomasi,et al.  Perceptual metrics for image database navigation , 1999 .