The impact of attentional, linguistic, and visual features during object naming

Object detection and identification are fundamental to human vision, and there is mounting evidence that objects guide the allocation of visual attention. However, the role of objects in tasks involving multiple modalities is less clear. To address this question, we investigate object naming, a task in which participants have to verbally identify objects they see in photorealistic scenes. We report an eye-tracking study that investigates which features (attentional, visual, and linguistic) influence object naming. We find that the amount of visual attention directed toward an object, its position and saliency, along with linguistic factors such as word frequency, animacy, and semantic proximity, significantly influence whether the object will be named or not. We then ask how features from different modalities are combined during naming, and find significant interactions between saliency and position, saliency and linguistic features, and attention and position. We conclude that when the cognitive system performs tasks such as object naming, it uses input from one modality to constraint or enhance the processing of other modalities, rather than processing each input modality independently.

[1]  Tom Troscianko,et al.  Regional effects of clutter on human target detection performance. , 2013, Journal of vision.

[2]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[3]  C. Koch,et al.  Task-demands can immediately reverse the effects of sensory-driven saliency in complex visual stimuli. , 2008, Journal of vision.

[4]  Christopher M. Masciocchi,et al.  Everyone knows what is interesting: salient locations which should be fixated. , 2009, Journal of vision.

[5]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[6]  Pietro Perona,et al.  Measuring and Predicting Object Importance , 2011, International Journal of Computer Vision.

[7]  Myriam Chanceaux,et al.  The influence of clutter on real-world scene search: evidence from search efficiency and eye movements. , 2009, Journal of vision.

[8]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[9]  Alex D. Hwang,et al.  Object Frequency and Predictability Effects on Eye Fixation Durations in Real-World Scene Viewing , 2010 .

[10]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[11]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2012, ECCV.

[12]  Pietro Perona,et al.  Some Objects Are More Equal Than Others: Measuring and Predicting Importance , 2008, ECCV.

[13]  Zenzi M. Griffin,et al.  PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING , 2022 .

[14]  Alexander Toet,et al.  Computational versus Psychophysical Bottom-Up Image Saliency: A Comparative Evaluation Study , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  W. Levelt,et al.  Viewing and naming objects: eye movements during noun phrase production , 1998, Cognition.

[17]  Kenneth Holmqvist,et al.  Eye tracking: a comprehensive guide to methods and measures , 2011 .

[18]  Lina I. Conlan,et al.  Eye movement patterns during the recognition of three-dimensional objects: preferential fixation of concave surface curvature minima. , 2012, Journal of vision.

[19]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[20]  Paul Schedl,et al.  The locus of , 1984 .

[21]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[22]  Katie L. McMahon,et al.  Semantic Context and Visual Feature Effects in Object Naming: An fMRI Study using Arterial Spin Labeling , 2009, Journal of Cognitive Neuroscience.

[23]  A. Caramazza,et al.  The locus of the frequency effect in picture naming: When recognizing is not enough , 2007, Psychonomic bulletin & review.

[24]  Christof Koch,et al.  Modeling attention to salient proto-objects , 2006, Neural Networks.

[25]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[26]  Gregory J. Zelinsky,et al.  Synchronizing Visual and Language Processing: An Effect of Object Name Length on Eye Movements , 2000, Psychological science.

[27]  D. E. Irwin,et al.  Eye movements and scene perception: Memory for things observed , 2002, Perception & psychophysics.

[28]  W. Levelt,et al.  Effects of semantic context in the naming of pictures and words , 2001, Cognition.

[29]  Lester C. Loschky,et al.  Eye movements serialize memory for objects in scenes , 2005, Perception & psychophysics.

[30]  D. Barr,et al.  Random effects structure for confirmatory hypothesis testing: Keep it maximal. , 2013, Journal of memory and language.

[31]  Laurent Itti,et al.  Interesting objects are visually salient. , 2008, Journal of vision.

[32]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[33]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[34]  Zenzi M. Griffin,et al.  Constraint, Word Frequency, and the Relationship between Lexical Processing Levels in Spoken Word Production , 1998 .

[35]  J. Henderson,et al.  Object-based attentional selection in scene viewing. , 2010, Journal of vision.

[36]  Alex D. Hwang,et al.  Semantic guidance of eye movements in real-world scenes , 2011, Vision Research.

[37]  Yuanzhen Li,et al.  Measuring visual clutter. , 2007, Journal of vision.

[38]  R. D. Gordon Attentional allocation during the perception of scenes. , 2004, Journal of experimental psychology. Human perception and performance.

[39]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.