On computational modeling of visual saliency: Examining what’s right, and what’s left

In the past decade, a large number of computational models of visual saliency have been proposed. Recently a number of comprehensive benchmark studies have been presented, with the goal of assessing the performance landscape of saliency models under varying conditions. This has been accomplished by considering fixation data, annotated image regions, and stimulus patterns inspired by psychophysics. In this paper, we present a high-level examination of challenges in computational modeling of visual saliency, with a heavy emphasis on human vision and neural computation. This includes careful assessment of different metrics for performance of visual saliency models, and identification of remaining difficulties in assessing model performance. We also consider the importance of a number of issues relevant to all saliency models including scale-space, the impact of border effects, and spatial or central bias. Additionally, we consider the biological plausibility of models in stepping away from exemplar input patterns towards a set of more general theoretical principles consistent with behavioral experiments. As a whole, this presentation establishes important obstacles that remain in visual saliency modeling, in addition to identifying a number of important avenues for further investigation.

[1]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  John K. Tsotsos A Computational Perspective on Visual Attention , 2011 .

[3]  Nuno Vasconcelos,et al.  A discriminant hypothesis for visual saliency: computational principles, biological plausibility and applications in computer vision , 2008 .

[4]  C. Koch,et al.  Faces and text attract gaze independent of the task: Experimental data and computer model. , 2009, Journal of vision.

[5]  H. Nothdurft Saliency effects across dimensions in visual search , 1993, Vision Research.

[6]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[7]  Peter König,et al.  Human eye-head co-ordination in natural exploration , 2007, Network.

[8]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[9]  J. Bullier Integrated model of visual processing , 2001, Brain Research Reviews.

[10]  J. Findlay Saccade Target Selection During Visual Search , 1997, Vision Research.

[11]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[12]  Anthony J. Hornof,et al.  A minimal model for predicting visual search in human-computer interaction , 2007, CHI.

[13]  W. Wolf,et al.  Corrective saccades: Effect of shifting the saccade goal , 1982, Vision Research.

[14]  Stephen Grossberg,et al.  Neural dynamics of adaptive sensory-motor control : ballistic eye movements , 1986 .

[15]  Zhaoping Li A saliency map in primary visual cortex , 2002, Trends in Cognitive Sciences.

[16]  John K. Tsotsos,et al.  A statistical basis for visual field anisotropies , 2006, Neurocomputing.

[17]  John K. Tsotsos,et al.  Visual Representation Determines Search Difficulty: Explaining Visual Search Asymmetries , 2011, Front. Comput. Neurosci..

[18]  M. Carrasco Visual attention: The past 25 years , 2011, Vision Research.

[19]  Michelle R. Greene,et al.  Visual search in scenes involves selective and nonselective pathways , 2011, Trends in Cognitive Sciences.

[20]  Antón García-Díaz,et al.  Saliency from hierarchical adaptation through decorrelation and variance normalization , 2012, Image Vis. Comput..

[21]  Leslie G. Ungerleider,et al.  Modulation of sensory suppression: implications for receptive field sizes in the human visual cortex. , 2001, Journal of neurophysiology.

[22]  John K. Tsotsos,et al.  Fast, recurrent, attentional modulation improves saliency representation and scene recognition , 2011, CVPR 2011 WORKSHOPS.

[23]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[24]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[25]  Sabine Süsstrunk,et al.  Saliency detection for content-aware image resizing , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[26]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[27]  Ali Borji,et al.  What stands out in a scene? A study of human explicit saliency judgment , 2013, Vision Research.

[28]  Tilke Judd,et al.  Understanding and predicting where people look in images , 2011 .

[29]  Stan Sclaroff,et al.  Saliency Detection: A Boolean Map Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[31]  Mary Hayhoe,et al.  Control of attention and gaze in complex environments. , 2006, Journal of vision.

[32]  J. Wolfe,et al.  The role of categorization in visual search for orientation. , 1992, Journal of experimental psychology. Human perception and performance.

[33]  J. Triesch,et al.  Power spectra of the natural input to the visual system , 2013, Vision Research.

[34]  D. C. Essen,et al.  Hierarchical organization and functional streams in the visual cortex , 1983, Trends in Neurosciences.

[35]  Kenneth M. Heilman,et al.  The Influence of Center of Mass Effect on the Distribution of Spatial Attention in the Vertical and Horizontal Dimensions , 1997, Brain and Cognition.

[36]  Benjamin B. Bederson,et al.  Automatic thumbnail cropping and its effectiveness , 2003, UIST '03.

[37]  Alexander C. Schütz,et al.  Dynamic integration of information about salience and value for saccadic eye movements , 2012, Proceedings of the National Academy of Sciences.

[38]  A Treisman,et al.  Feature analysis in early vision: evidence from search asymmetries. , 1988, Psychological review.

[39]  Rainer Stiefelhagen,et al.  Quaternion-Based Spectral Saliency Detection for Eye Fixation Prediction , 2012, ECCV.

[40]  James J. Clark,et al.  Microsaccades as an overt measure of covert attention shifts , 2002, Vision Research.

[41]  Simone Frintrop,et al.  Center-surround divergence of feature statistics for salient object detection , 2011, 2011 International Conference on Computer Vision.

[42]  Hendrik Koesling,et al.  Saccadic eye movements in the dark while performing an automatized sequential high-speed sensorimotor task. , 2012, Journal of vision.

[43]  Lin Chen The topological approach to perceptual organization , 2005 .

[44]  Yang Wang,et al.  Examining visual saliency prediction in naturalistic scenes , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[45]  H. Basford,et al.  Optimal eye movement strategies in visual search , 2005 .

[46]  H. Nothdurft Salience from feature contrast: additivity across dimensions , 2000, Vision Research.

[47]  J. Ziman Information, Communication, Knowledge , 1969, Nature.

[48]  Nuno Vasconcelos,et al.  Decision-Theoretic Saliency: Computational Principles, Biological Plausibility, and Implications for Neurophysiology and Psychophysics , 2009, Neural Computation.

[49]  G. Mangun,et al.  The neural mechanisms of top-down attentional control , 2000, Nature Neuroscience.

[50]  John K. Tsotsos,et al.  Attentional Modulation and Selection – An Integrated Approach , 2014, PloS one.

[51]  Nuno Vasconcelos,et al.  The discriminant center-surround hypothesis for bottom-up saliency , 2007, NIPS.

[52]  Catherine L. Reed,et al.  The Influence of the Body and Action on Spatial Attention , 2008, WAPCV.

[53]  Ali Borji,et al.  Analysis of Scores, Datasets, and Models in Visual Saliency Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  H. Nothdurft,et al.  Salience and target selection in visual search , 2006 .

[55]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[56]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[57]  H. Nothdurft The conspicuousness of orientation and motion contrast. , 1993, Spatial vision.

[58]  Anirvan S. Nandy,et al.  Saccade-confounded image statistics explain visual crowding , 2012, Nature Neuroscience.

[59]  John K. Tsotsos,et al.  Cognitive programs: software for attention's executive , 2014, Front. Psychol..

[60]  Myriam Chanceaux,et al.  The influence of clutter on real-world scene search: evidence from search efficiency and eye movements. , 2009, Journal of vision.

[61]  R. Rosenholtz Search asymmetries? What search asymmetries? , 2001, Perception & psychophysics.

[62]  Frédo Durand,et al.  A Benchmark of Computational Models of Saliency to Predict Human Fixations , 2012 .

[63]  Manfred Fahle,et al.  Parallel perception of vernier offsets, curvature, and chevrons in humans , 1991, Vision Research.

[64]  Michael W. Levine,et al.  Magnocellular and parvocellular visual pathway contributions to visual field anisotropies , 2007, Vision Research.

[65]  D. Ballard,et al.  Modelling the role of task in the control of gaze , 2009, Visual cognition.

[66]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[67]  J. Wolfe Asymmetries in visual search: An introduction , 2001, Perception & psychophysics.

[68]  S J Luck,et al.  Electrophysiological evidence for parallel and serial processing during visual search , 1990, Perception & psychophysics.

[69]  Tim K Marks,et al.  SUN: A Bayesian framework for saliency using natural statistics. , 2008, Journal of vision.

[70]  I Kovács,et al.  A closed curve is much more than an incomplete one: effect of closure in figure-ground segmentation. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[71]  Ali Borji,et al.  Bayesian optimization explains human active search , 2013, NIPS.

[72]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[73]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[74]  I. Rentschler,et al.  Peripheral vision and pattern recognition: a review. , 2011, Journal of vision.

[75]  Michael Lindenbaum,et al.  Esaliency (Extended Saliency): Meaningful Attention Using Stochastic Image Modeling , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  John K. Tsotsos,et al.  An Attentional Prototype for Early Vision , 1992, ECCV.

[77]  V. Ramachandran,et al.  On the perception of shape from shading , 1988, Nature.

[78]  B. Velichkovsky,et al.  Two Visual Systems and Their Eye Movements: Evidence from Static and Dynamic Scene Perception , 2005 .

[79]  C. Koch,et al.  Attention activates winner-take-all competition among visual filters , 1999, Nature Neuroscience.

[80]  James Elder,et al.  A measure of closure , 1994, Vision Research.

[81]  Aykut Erdem,et al.  Visual saliency estimation by nonlinearly integrating features using region covariances. , 2013, Journal of vision.

[82]  M. Eckstein The Lower Visual Search Efficiency for Conjunctions Is Due to Noise and not Serial Attentional Processing , 1998 .

[83]  H. Pashler,et al.  Detecting conjunctions of color and form: Reassessing the serial search hypothesis , 1987, Perception & psychophysics.

[84]  Stefan Kohlbecher,et al.  EyeSeeCam: An Eye Movement–Driven Head Camera for the Examination of Natural Visual Exploration , 2009, Annals of the New York Academy of Sciences.

[85]  Jillian H. Fecteau,et al.  Salience, relevance, and firing: a priority map for target selection , 2006, Trends in Cognitive Sciences.

[86]  Laurence T. Maloney,et al.  Where are kin recognition cues in the face , 2010 .

[87]  Garrison W. Cottrell,et al.  Visual saliency model for robot cameras , 2008, 2008 IEEE International Conference on Robotics and Automation.

[88]  Thierry Baccino,et al.  New insights into ambient and focal visual fixations using an automatic classification algorithm , 2011, i-Perception.

[89]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[90]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[91]  Peyman Milanfar,et al.  Static and space-time visual saliency detection by self-resemblance. , 2009, Journal of vision.

[92]  John K. Tsotsos,et al.  On Sensor Bias in Experimental Methods for Comparing Interest-Point, Saliency, and Recognition Algorithms , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Christof Koch,et al.  Image Signature: Highlighting Sparse Salient Regions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  David W. Jacobs,et al.  Mesh saliency , 2005, SIGGRAPH 2005.

[95]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[96]  Max A. Viergever,et al.  Scale and the differential structure of images , 1992, Image Vis. Comput..

[97]  Jian Liu,et al.  Visual saliency detection via rank-sparsity decomposition , 2010, 2010 IEEE International Conference on Image Processing.

[98]  Peter König,et al.  Eye–Head Coordination during Free Exploration in Human and Cat , 2009, Annals of the New York Academy of Sciences.

[99]  Barry B. Lee,et al.  Temporal frequency and chromatic processing in humans: an fMRI study of the cortical visual areas. , 2011, Journal of vision.

[100]  Amitabh Varshney,et al.  Saliency-guided Enhancement for Volume Visualization , 2006, IEEE Transactions on Visualization and Computer Graphics.

[101]  D. C. Essen,et al.  The visual field representation in striate cortex of the macaque monkey: Asymmetries, anisotropies, and individual variability , 1984, Vision Research.

[102]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[103]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[104]  Meng Wang,et al.  Image saliency: From intrinsic to extrinsic context , 2011, CVPR 2011.

[105]  H. Nothdurft,et al.  Salience from feature contrast: temporal properties of saliency mechanisms , 2000, Vision Research.

[106]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[107]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[108]  Frans W Cornelissen,et al.  A crowding model of visual clutter. , 2009, Journal of vision.

[109]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[110]  Anthony T. Herdman,et al.  Spatio-temporal Brain Dynamics Underlying Saccade Execution, Suppression, and Error-related Feedback , 2007, Journal of Cognitive Neuroscience.

[111]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[112]  Benjamin W. Tatler,et al.  Current understanding of eye guidance , 2009 .

[113]  Shenmin Zhang,et al.  What do saliency models predict? , 2014, Journal of vision.

[114]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[115]  Ali Borji,et al.  Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study , 2013, IEEE Transactions on Image Processing.

[116]  Nicolas Riche,et al.  Saliency and Human Fixations: State-of-the-Art and Study of Comparison Metrics , 2013, 2013 IEEE International Conference on Computer Vision.

[117]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[118]  M. Carandini,et al.  Normalization as a canonical neural computation , 2013, Nature Reviews Neuroscience.

[119]  J M Wolfe,et al.  Curvature is a Basic Feature for Visual Search Tasks , 1992, Perception.

[120]  J M Rubin,et al.  Topological perception: Holes in an experiment , 1985, Perception & psychophysics.

[121]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[122]  R. Rosenholtz,et al.  The effect of background color on asymmetries in color search. , 2004, Journal of vision.

[123]  K. Turano,et al.  Oculomotor strategies for the direction of gaze tested with a real-world activity , 2003, Vision Research.

[124]  A. Kingstone,et al.  Topic: Cognition , 2003 .

[125]  Henrik I. Christensen,et al.  Computational visual attention systems and their cognitive foundations: A survey , 2010, TAP.

[126]  A. Treisman,et al.  Search asymmetry: a diagnostic for preattentive processing of separable features. , 1985, Journal of experimental psychology. General.

[127]  L Chen,et al.  Topological structure in visual perception. , 1982, Science.

[128]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[129]  T. Foulsham,et al.  How Does the Purpose of Inspection Influence the Potency of Visual Salience in Scene Perception? , 2007, Perception.

[130]  Mary M Hayhoe,et al.  Spatial memory and saccadic targeting in a natural task. , 2005, Journal of vision.

[131]  Sabine Süsstrunk,et al.  Salient Region Detection and Segmentation , 2008, ICVS.

[132]  Simone Frintrop,et al.  Goal-Directed Search with a Top-Down Modulated Computational Attention System , 2005, DAGM-Symposium.

[133]  D. V. van Essen,et al.  Response modulation by texture surround in primate area V1: Correlates of “popout” under anesthesia , 1999, Visual Neuroscience.

[134]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[135]  G. Zelinsky A theory of eye movements during target acquisition. , 2008, Psychological review.

[136]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[137]  H. Nothdurft Salience from feature contrast: variations with texture density , 2000, Vision Research.

[138]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[139]  G. Rizzolatti,et al.  Reorienting attention across the horizontal and vertical meridians: Evidence in favor of a premotor theory of attention , 1987, Neuropsychologia.

[140]  M. Hayhoe,et al.  Adaptive Gaze Control in Natural Environments , 2009, The Journal of Neuroscience.

[141]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[142]  T. Lindeberg Scale-Space Theory : A Basic Tool for Analysing Structures at Different Scales , 1994 .

[143]  Peter Dayan,et al.  Cortical Surround Interactions and Perceptual Salience via Natural Scene Statistics , 2012, PLoS Comput. Biol..

[144]  Hermann von Helmholtz,et al.  Treatise on Physiological Optics , 1962 .

[145]  T. Foulsham,et al.  What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. , 2008, Journal of vision.

[146]  Ken Nakayama,et al.  Serial and parallel processing of visual feature conjunctions , 1986, Nature.

[147]  R. Rosenholtz,et al.  A summary statistic representation in peripheral vision explains visual search. , 2009, Journal of vision.

[148]  Xin Chen,et al.  Real-world visual search is dominated by top-down guidance , 2006, Vision Research.

[149]  Peter König,et al.  Salient features in gaze-aligned recordings of human visual input during free exploration of natural environments. , 2008, Journal of vision.

[150]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[151]  D. Heeger,et al.  The Normalization Model of Attention , 2009, Neuron.

[152]  Neil D. B. Bruce Towards fine-grained fixation analysis: distilling out context dependence , 2014, ETRA.

[153]  J. Wolfe,et al.  PSYCHOLOGICAL SCIENCE Research Article WHAT CAN 1 MILLION TRIALS TELL US ABOUT VISUAL SEARCH? , 2022 .

[154]  David L. Sheinberg,et al.  Eye movements during parallel-serial visual search. , 1997, Journal of experimental psychology. Human perception and performance.

[155]  H. BOUMA,et al.  Interaction Effects in Parafoveal Letter Recognition , 1970, Nature.

[156]  Michael L. Mack,et al.  VISUAL SALIENCY DOES NOT ACCOUNT FOR EYE MOVEMENTS DURING VISUAL SEARCH IN REAL-WORLD SCENES , 2007 .

[157]  D. Henson Corrective saccades: Effects of altering visual feedback , 1978, Vision Research.

[158]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[159]  Yuanzhen Li,et al.  Measuring visual clutter. , 2007, Journal of vision.

[160]  Michael F. Land,et al.  From eye movements to actions: how batsmen hit the ball , 2000, Nature Neuroscience.

[161]  Andrew P. Witkin,et al.  Scale-space filtering: A new approach to multi-scale description , 1984, ICASSP.

[162]  M. Rucci,et al.  Microsaccades Precisely Relocate Gaze in a High Visual Acuity Task , 2010, Nature Neuroscience.

[163]  F. Previc Functional specialization in the lower and upper visual fields in humans: Its ecological origins and neurophysiological implications , 1990, Behavioral and Brain Sciences.

[164]  Stefan Winkler,et al.  Overview of Eye tracking Datasets , 2013, 2013 Fifth International Workshop on Quality of Multimedia Experience (QoMEX).

[165]  Sabine Kastner,et al.  Effects of focused and distributed attention on neural competition , 2004 .

[166]  J. Duncan,et al.  Visual search and stimulus similarity. , 1989, Psychological review.

[167]  Björn N.S. Vlaskamp,et al.  Crowding degrades saccadic search performance , 2005 .

[168]  Tim C Kietzmann,et al.  Investigating task-dependent top-down effects on overt visual attention. , 2010, Journal of vision.

[169]  B Julesz,et al.  Perceptual asymmetry in texture perception. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[170]  Henrik I. Christensen,et al.  Simultaneous Robot Localization and Mapping Based on a Visual Attention System , 2008, WAPCV.

[171]  Susan L. Franzel,et al.  Binocularity and visual search , 1988, Perception & psychophysics.

[172]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[173]  Xiaolin Zhou,et al.  Center of mass attracts attention , 2006, Neuroreport.

[174]  Kurt Debattista,et al.  A GPU based saliency map for high-fidelity selective rendering , 2006, AFRIGRAPH '06.

[175]  Pi-Chun Huang,et al.  Pattern masking: the importance of remote spatial frequencies and their phase alignment. , 2012, Journal of vision.

[176]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[177]  Laurent Itti,et al.  Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[178]  S. Avidan,et al.  Seam carving for content-aware image resizing , 2007, SIGGRAPH 2007.

[179]  Meredith Ringel Morris,et al.  What do you see when you're surfing?: using eye tracking to predict salient regions of web pages , 2009, CHI.

[180]  Laurent Itti,et al.  Mobile robot vision navigation & localization using Gist and Saliency , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[181]  Krista A. Ehinger,et al.  Rethinking the Role of Top-Down Attention in Vision: Effects Attributable to a Lossy Representation in Peripheral Vision , 2011, Front. Psychology.

[182]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[183]  Allen L. Nagy,et al.  Distractor heterogeneity, attention, and color in visual search , 2003, Vision Research.