Modelling search for people in 900 scenes: A combined source model of eye guidance

How predictable are human eye movements during search in real world scenes? We recorded 14 observers’ eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from the scene. These eye movements were used to evaluate computational models of search guidance from three sources: Saliency, target features, and scene context. Each of these models independently outperformed a cross-image control in predicting human fixations. Models that combined sources of guidance ultimately predicted 94% of human agreement, with the scene context component providing the most explanatory power. None of the models, however, could reach the precision and fidelity of an attentional map defined by human fixations. This work puts forth a benchmark for computational models of search in real world scenes. Further improvements in modelling should capture mechanisms underlying the selectivity of observers’ fixations during search.

[1]  Nancy Millette,et al.  How People Look at Pictures , 1935 .

[2]  M. Tinker How People Look at Pictures. , 1936 .

[3]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[4]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[5]  L. Stark,et al.  Scanpaths in Eye Movements during Pattern Perception , 1971, Science.

[6]  N. Mackworth,et al.  Cognitive determinants of fixation location during picture viewing. , 1978, Journal of experimental psychology. Human perception and performance.

[7]  A. Friedman Framing pictures: the role of knowledge in automatized encoding and memory for gist. , 1979, Journal of experimental psychology. General.

[8]  A. Friedman Framing pictures: the role of knowledge in automatized encoding and memory for gist. , 1979, Journal of experimental psychology. General.

[9]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[10]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[11]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[12]  Susan L. Franzel,et al.  Guided search: an alternative to the feature integration model for visual search. , 1989, Journal of experimental psychology. Human perception and performance.

[13]  P. Viviani Eye movements in visual search: cognitive, perceptual and motor control aspects. , 1990, Reviews of oculomotor research.

[14]  P. de Graef,et al.  Perceptual effects of scene context on object identification , 1990, Psychological research.

[15]  Jun-ichiro Toriwaki,et al.  Use of Visual Information , 1991 .

[16]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[17]  A. Oliva,et al.  From Blobs to Boundary Edges: Evidence for Time- and Spatial-Scale-Dependent Scene Recognition , 1994 .

[18]  David N. Lee,et al.  Where we look when we steer , 1994, Nature.

[19]  D. S. Wooding,et al.  Automatic control of saccadic eye movements made in visual inspection of briefly presented 2-D images. , 1995, Spatial vision.

[20]  John K. Tsotsos,et al.  Modeling Visual Attention via Selective Tuning , 1995, Artif. Intell..

[21]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[22]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[23]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[24]  D. Coppola,et al.  Idiosyncratic characteristics of saccadic eye movements when viewing different visual environments , 1999, Vision Research.

[25]  R. Rosenholtz A simple saliency model predicts a number of motion popout phenomena , 1999, Vision Research.

[26]  J. Henderson,et al.  The effects of semantic consistency on eye movements during complex scene viewing , 1999 .

[27]  Michael F. Land,et al.  From eye movements to actions: how batsmen hit the ball , 2000, Nature Neuroscience.

[28]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[29]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[30]  Eileen Kowler,et al.  Eye movements during visual search: the costs of choosing the optimal path , 2001, Vision Research.

[31]  Rajesh P. N. Rao,et al.  Eye movements in iconic visual search , 2002, Vision Research.

[32]  Zhaoping Li A saliency map in primary visual cortex , 2002, Trends in Cognitive Sciences.

[33]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[35]  Derrick J. Parkhurst,et al.  Modeling the role of salience in the allocation of overt visual attention , 2002, Vision Research.

[36]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[37]  M. Chun Scene Perception and Memory , 2003 .

[38]  Antonio Torralba,et al.  Modeling global scene factors in attention. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[39]  K. Turano,et al.  Oculomotor strategies for the direction of gaze tested with a real-world activity , 2003, Vision Research.

[40]  Mary M Hayhoe,et al.  Visual memory and motor planning in a natural task. , 2003, Journal of vision.

[41]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[42]  Derrick J. Parkhurst,et al.  Scene content selected by active vision. , 2003, Spatial vision.

[43]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[44]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[45]  Antonio Torralba,et al.  Contextual Models for Object Detection Using Boosted Random Fields , 2004, NIPS.

[46]  Zenzi M. Griffin,et al.  Why Look? Reasons for Eye Movements Related to Language Production. , 2004 .

[47]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[48]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[49]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[50]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[51]  J. Theeuwes,et al.  The role of stimulus-driven and goal-driven control in saccadic visual selection. , 2004, Journal of experimental psychology. Human perception and performance.

[52]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[53]  John K. Tsotsos,et al.  Saliency Based on Information Maximization , 2005, NIPS.

[54]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[55]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[56]  Paul T. Sowden,et al.  The use of visual information in natural scenes , 2005 .

[57]  Asha Iyer,et al.  Components of bottom-up gaze allocation in natural images , 2005, Vision Research.

[58]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[59]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[60]  Olivier R. Joubert,et al.  How long to get to the “gist” of real-world natural scenes? , 2005 .

[61]  M. Bravo,et al.  Object recognition in dense clutter , 2006, Perception & psychophysics.

[62]  R. Baddeley,et al.  The long and the short of it: Spatial statistics at fixation vary with saccade amplitude and task , 2006, Vision Research.

[63]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[64]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[65]  Miguel P Eckstein,et al.  Attentional Cues in Real Scenes, Saccadic Targeting, and Bayesian Priors , 2005, Psychological science.

[66]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[67]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[68]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[69]  M. Pomplun Saccadic selectivity in complex visual search displays , 2006, Vision Research.

[70]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[71]  Xin Chen,et al.  Real-world visual search is dominated by top-down guidance , 2006, Vision Research.

[72]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[73]  Gregory J. Zelinsky,et al.  Scene context guides eye movements during visual search , 2006, Vision Research.

[74]  Preeti Verghese,et al.  Where to look next? Eye movements reduce local uncertainty. , 2007, Journal of vision.

[75]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[76]  Jeremy M. Wolfe,et al.  Guided Search 4.0: Current Progress With a Model of Visual Search , 2007, Integrated Models of Cognitive Systems.

[77]  G. Underwood,et al.  Low-level visual saliency does not predict change detection in natural scenes. , 2007, Journal of vision.

[78]  Frank E. Ritter,et al.  The Rise of Cognitive Architectures , 2007, Integrated Models of Cognitive Systems.

[79]  Guillaume A. Rousselet,et al.  Processing scene context: Fast categorization and object interference , 2007, Vision Research.

[80]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  John K. Tsotsos,et al.  Attention and Visual Search , 2007, Int. J. Neural Syst..

[82]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[83]  P. Perona,et al.  What do we perceive in a glance of a real-world scene? , 2007, Journal of vision.

[84]  Yuanzhen Li,et al.  Measuring visual clutter. , 2007, Journal of vision.

[85]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[86]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Michael L. Mack,et al.  VISUAL SALIENCY DOES NOT ACCOUNT FOR EYE MOVEMENTS DURING VISUAL SEARCH IN REAL-WORLD SCENES , 2007 .

[88]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[89]  J. Henderson,et al.  Initial scene representations facilitate eye movement guidance in visual search. , 2007, Journal of experimental psychology. Human perception and performance.

[90]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[91]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[92]  F. Hamker,et al.  About the influence of post-saccadic mechanisms for visual stability on peri-saccadic compression of object location. , 2008, Journal of vision.

[93]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  C. Koch,et al.  Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. , 2008, Journal of vision.

[95]  T. Foulsham,et al.  What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. , 2008, Journal of vision.

[96]  G. Zelinsky A theory of eye movements during target acquisition. , 2008, Psychological review.

[97]  C. Tallon-Baudry,et al.  Unconscious associative memory affects visual processing before 100 ms. , 2008, Journal of vision.

[98]  B. Tatler,et al.  The prominence of behavioural biases in eye guidance , 2009 .

[99]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[100]  Stephen Grossberg,et al.  ARTSCENE: A neural system for natural scene classification. , 2009, Journal of vision.

[101]  G. Zelinsky,et al.  An effect of referential scene constraint on search implies scene segmentation , 2009 .

[102]  Michelle R. Greene,et al.  Recognition of natural scenes from global properties: Seeing the forest without representing the trees , 2009, Cognitive Psychology.

[103]  Matthew H Tong,et al.  SUN: Top-down saliency using natural statistics , 2009, Visual cognition.

[104]  Jason A. Droll,et al.  Expected object position of two hundred fifty observers predicts first fixations of seventy seven separate observers during search , 2010 .

[105]  B. Ross The Psychology of Learning and Motivation: Advances in Research and Theory , 2010 .