Scan Patterns Predict Sentence Production in the Cross-Modal Processing of Visual Scenes

Most everyday tasks involve multiple modalities, which raises the question of how the processing of these modalities is coordinated by the cognitive system. In this paper, we focus on the coordination of visual attention and linguistic processing during speaking. Previous research has shown that objects in a visual scene are fixated before they are mentioned, leading us to hypothesize that the scan pattern of a participant can be used to predict what he or she will say. We test this hypothesis using a data set of cued scene descriptions of photo-realistic scenes. We demonstrate that similar scan patterns are correlated with similar sentences, within and between visual scenes; and that this correlation holds for three phases of the language production process (target identification, sentence planning, and speaking). We also present a simple algorithm that uses scan patterns to accurately predict associated sentences by utilizing similarity-based retrieval.

[1]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[2]  John M Henderson,et al.  The time course of initial scene processing for eye movement guidance in natural scene search. , 2010, Journal of vision.

[3]  Marcus Nyström,et al.  Semantic override of low-level features in image viewing - both initially and overall , 2008 .

[4]  Michael C. Frank,et al.  PSYCHOLOGICAL SCIENCE Research Article Using Speakers ’ Referential Intentions to Model Early Cross-Situational Word Learning , 2022 .

[5]  Jean Underwood,et al.  Remembering Pictures of Real-World Images Using Eye Fixation Sequences in Imagery and in Recognition , 2009, ICVW.

[6]  T. Foulsham,et al.  What can saliency models predict about eye movements? Spatial and sequential aspects of fixations during encoding and recognition. , 2008, Journal of vision.

[7]  John M Henderson,et al.  I see what you're saying: the integration of complex speech and scenes during language comprehension. , 2011, Acta psychologica.

[8]  David R. Wozny,et al.  Human trimodal perception follows optimal statistical inference. , 2008, Journal of vision.

[9]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[10]  D. C. Howell Statistical Methods for Psychology , 1987 .

[11]  M. Land Eye movements and the control of actions in everyday life , 2006, Progress in Retinal and Eye Research.

[12]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[13]  Chen Yu,et al.  A unified model of early word learning: Integrating statistical and social cues , 2007, Neurocomputing.

[14]  L. Gleitman,et al.  On the give and take between event apprehension and utterance formulation. , 2007, Journal of memory and language.

[15]  D. Alais,et al.  Orientation tuning of contrast masking caused by motion streaks. , 2010, Journal of vision.

[16]  M. Potter Short-term conceptual memory for pictures. , 1976, Journal of experimental psychology. Human learning and memory.

[17]  C. Koch,et al.  A saliency-based search mechanism for overt and covert shifts of visual attention , 2000, Vision Research.

[18]  Zenzi M. Griffin,et al.  PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING , 2022 .

[19]  M. Hayhoe Vision Using Routines: A Functional Account of Vision , 2000 .

[20]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[21]  G. Underwood,et al.  Fixation sequences in imagery and in recognition during the processing of pictures of real-world scenes , 2008 .

[22]  Jan Theeuwes,et al.  ScanMatch: A novel method for comparing fixation sequences , 2010, Behavior research methods.

[23]  Mirella Lapata,et al.  Language Models Based on Semantic Composition , 2009, EMNLP.

[24]  George L. Malcolm,et al.  Combining top-down processes to guide eye movements during real-world scene search. , 2010, Journal of vision.

[25]  Gregory J. Zelinsky,et al.  Scene context guides eye movements during visual search , 2006, Vision Research.

[26]  Moreno I. Coco,et al.  Sentence Production in Naturalistic Scenes with Referential Ambiguity , 2010 .

[27]  M. Land,et al.  The Roles of Vision and Eye Movements in the Control of Activities of Daily Living , 1998, Perception.

[28]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[29]  M. Castelhano,et al.  The relative contribution of scene context and target features to visual search in scenes , 2010, Attention, perception & psychophysics.

[30]  Marshall R. Mayberry,et al.  A Connectionist Model of the Coordinated Interplay of Scene, Utterance, and World Knowledge , 2006 .

[31]  E. Marshall,et al.  NIMH: caught in the line of fire without a general , 1995, Science.

[32]  J. Henderson Regarding Scenes , 2007 .

[33]  Joyce Yue Chai,et al.  Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue , 2010, EMNLP.

[34]  Gregory J. Zelinsky,et al.  Visual search is guided to categorically-defined targets , 2009, Vision Research.

[35]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[36]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[37]  Aïda Valls,et al.  A Similarity Measure for Sequences of Categorical Data Based on the Ordering of Common Elements , 2008, MDAI.

[38]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[39]  Joyce Yue Chai,et al.  An Exploration of Eye Gaze in Spoken Language Processing for Multimodal Conversational Interfaces , 2007, HLT-NAACL.

[40]  John M. Findlay,et al.  Visual Attention: The Active Vision Perspective , 2001 .

[41]  G. Zelinsky,et al.  Short article: Search guidance is proportional to the categorical specificity of a target cue , 2009, Quarterly journal of experimental psychology.