Overt attentional correlates of memorability of scene images and their relationships to scene semantics

Computer vision-based research has shown that scene semantics (e.g., presence of meaningful objects in a scene) can predict memorability of scene images. Here, we investigated whether and to what extent overt attentional correlates, such as fixation map consistency (also called inter-observer congruency of fixation maps) and fixation counts, mediate the relationship between scene semantics and scene memorability. First, we confirmed that the higher the fixation map consistency of a scene, the higher its memorability. Moreover, both fixation map consistency and its correlation to scene memorability were the highest in the first 2 seconds of viewing, suggesting that meaningful scene features that contribute to producing more consistent fixation maps early in viewing, such as faces and humans, may also be important for scene encoding. Second, we found that the relationship between scene semantics and scene memorability was partially (but not fully) mediated by fixation map consistency and fixation counts, separately as well as together. Third, we found that fixation map consistency, fixation counts, and scene semantics significantly and additively contributed to scene memorability. Together, these results suggest that eye-tracking measurements can complement computer vision-based algorithms and improve overall scene memorability prediction.

[1]  J. Henderson,et al.  Object-based attentional selection in scene viewing. , 2010, Journal of vision.

[2]  Randolph Blake,et al.  Pupil size dynamics during fixation impact the accuracy and precision of video-based gaze estimation , 2016, Vision Research.

[3]  Iain D. Gilchrist,et al.  Visual correlates of fixation selection: effects of scale and time , 2005, Vision Research.

[4]  M. Pomplun,et al.  Guidance of visual attention by semantic information in real-world scenes , 2014, Front. Psychol..

[5]  J. Henderson Human gaze control during real-world scene perception , 2003, Trends in Cognitive Sciences.

[6]  M. Berman,et al.  Violence reduces attention to faces and draws attention to points of contact , 2019, Scientific Reports.

[7]  Thomas Martinetz,et al.  Variability of eye movements when viewing dynamic natural scenes. , 2010, Journal of vision.

[8]  J. Henderson Gaze Control as Prediction , 2017, Trends in Cognitive Sciences.

[9]  J. Weesie,et al.  Integration of visual and inertial cues in perceived heading of self-motion. , 2010, Journal of vision.

[10]  Bahador Bahrami,et al.  Precision of working memory for visual motion sequences and transparent motion surfaces. , 2011, Journal of vision.

[11]  Nancy B. Carlisle,et al.  Where do we store the memory representations that guide attention? , 2013, Journal of vision.

[12]  Wolfgang Einhäuser,et al.  Salient in space, salient in time: Fixation probability predicts fixation duration during natural scene viewing. , 2016, Journal of vision.

[13]  Antonio Torralba,et al.  Understanding and Predicting Image Memorability at a Large Scale , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  M. Gamer,et al.  Top-down influence on gaze patterns in the presence of social features , 2017, PloS one.

[15]  J. Henderson,et al.  Classifying mental states from eye movements during scene viewing. , 2015, Journal of experimental psychology. Human perception and performance.

[16]  Thierry Baccino,et al.  Methods for comparing scanpaths and saliency maps: strengths and weaknesses , 2012, Behavior Research Methods.

[17]  Gordon D. Love,et al.  Creating correct blur and its effect on accommodation , 2018, Journal of vision.

[18]  Taylor R. Hayes,et al.  Center bias outperforms image salience but not semantics in accounting for attention during scene viewing , 2019, Attention, Perception, & Psychophysics.

[19]  P. Tiňo,et al.  Learning predictive statistics from temporal sequences: Dynamics and strategies , 2017, Journal of vision.

[20]  Steven G. Luke,et al.  Dissociating temporal inhibition of return and saccadic momentum across multiple eye-movement tasks. , 2014, Journal of vision.

[21]  Maciej Pajak,et al.  Object-based saccadic selection during scene perception: evidence from viewing position effects. , 2013, Journal of vision.

[22]  M. Gamer,et al.  Preferential Processing of Social Features and Their Interplay with Physical Saliency in Complex Naturalistic Scenes , 2017, Front. Psychol..

[23]  M. Bindemann Scene and screen center bias early eye movements in scene viewing , 2010, Vision Research.

[24]  J. Henderson Regarding Scenes , 2007 .

[25]  K. Rayner,et al.  Eye movements and scene perception. , 1992, Canadian journal of psychology.

[26]  Michael L. Mack,et al.  Viewing task influences eye movement control during active scene perception. , 2009, Journal of vision.

[27]  C. Koch,et al.  Faces and text attract gaze independent of the task: Experimental data and computer model. , 2009, Journal of vision.

[28]  A. Nuthmann Fixation durations in scene viewing: Modeling the effects of local image features, oculomotor parameters, and task , 2016, Psychonomic bulletin & review.

[29]  Harry J. Wyatt,et al.  The human pupil and the use of video-based eyetrackers , 2010, Vision Research.

[30]  G. Loftus Eye fixations and recognition memory for pictures , 1972 .

[31]  Christoph Scheepers,et al.  Face, body, and center of gravity mediate person detection in natural scenes. , 2010, Journal of experimental psychology. Human perception and performance.

[32]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[33]  Grigori Yourganov,et al.  Observers' cognitive states modulate how visual inputs relate to gaze control. , 2016, Journal of experimental psychology. Human perception and performance.

[34]  A. Hollingworth Task specificity and the influence of memory on visual search: comment on Võ and Wolfe (2012). , 2012, Journal of experimental psychology. Human perception and performance.

[35]  Peter König,et al.  Measures and Limits of Models of Fixation Selection , 2011, PloS one.

[36]  D. Ballard,et al.  Modelling the role of task in the control of gaze , 2009, Visual cognition.

[37]  J. Henderson,et al.  To search or to like: Mapping fixations to differentiate two forms of incidental scene memory. , 2017, Journal of vision.

[38]  Taylor R. Hayes,et al.  Meaning-based guidance of attention in scenes as revealed by meaning maps , 2017, Nature Human Behaviour.

[39]  Steven G. Luke,et al.  Incidental memory for parts of scenes from eye movements , 2014 .

[40]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[41]  Matei Mancas,et al.  Memorability of natural scenes: The role of attention , 2013, 2013 IEEE International Conference on Image Processing.

[42]  A. Torralba,et al.  Intrinsic and extrinsic effects on image memorability , 2015, Vision Research.

[43]  G. T. Buswell How People Look At Pictures: A Study Of The Psychology Of Perception In Art , 2012 .

[44]  Jianxiong Xiao,et al.  What makes an image memorable? , 2011, CVPR 2011.

[45]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[46]  J. Wolfe,et al.  Is visual attention required for robust picture memory? , 2007, Vision Research.

[47]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[48]  Benjamin W Tatler,et al.  The influence of instructions on object memory in a real-world setting. , 2013, Journal of vision.

[49]  Christoforos Christoforou,et al.  From the eyes and the heart: a novel eye-gaze metric that predicts video preferences of a large audience , 2015, Front. Psychol..

[50]  Shuo Wang,et al.  Predicting human gaze beyond pixels. , 2014, Journal of vision.

[51]  L. Itti,et al.  Quantifying center bias of observers in free viewing of dynamic natural scenes. , 2009, Journal of vision.

[52]  E Winograd,et al.  Elaboration and distinctiveness in memory for faces. , 1981, Journal of experimental psychology. Human learning and memory.

[53]  Lucas C. Parra,et al.  Collective Behaviour in Video Viewing: A Thermodynamic Analysis of Gaze Position , 2017, PloS one.

[54]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[55]  J. Macke,et al.  Quantifying the effect of intertrial dependence on perceptual decisions. , 2014, Journal of vision.

[56]  A. Barbot,et al.  Optical and neural anisotropy in peripheral vision , 2016, Journal of vision.

[57]  H. Ritter,et al.  Disambiguating Complex Visual Information: Towards Communication of Personal Views of a Scene , 1996, Perception.

[58]  D. Ballard,et al.  Eye guidance in natural vision: reinterpreting salience. , 2011, Journal of vision.

[59]  David S Wooding,et al.  Eye movements of large populations: II. Deriving regions of interest, coverage, and similarity using fixation maps , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[60]  Taylor R. Hayes,et al.  Meaning guides attention in real-world scene images: Evidence from eye movements and meaning maps , 2017, bioRxiv.