Temporal Dynamics of Scan Patterns in Comprehension and Production Moreno I. Coco (M.I.Coco@sms.ed.ac.uk) and Frank Keller (keller@inf.ed.ac.uk) Institute for Language, Cognition and Computation School of Informatics, University of Edinburgh 10 Crichton Street, Edinburgh EH8 9AB, UK Abstract Speakers and listeners in a dialogue establish mutual under- standing by coordinating their linguistic responses. When a visual scene is present, scan patterns on that scene are also coordinated. However, it is an open question which linguis- tic and scene factors affect coordination. In this paper, we in- vestigate the coordination of scan patterns during the compre- hension and generation of scene descriptions. We manipulate the animacy of the subject and the number of visual referents associated with it. By using Cross Recurrence Analysis, we demonstrate that coordination emerges only during linguistic processing, and that it is especially pronounced for inanimate unambiguous subjects. When the subject is referentially am- biguous (more than one visual object associated with it), scan pattern variability increases to the extent that the animacy ef- fect is neutralized. Keywords: Scan patterns, situated language processing, cog- nitive dynamics, coordination Introduction When language is comprehended or produced in a visual context, information about fixated objects has to be inte- grated with the linguistic information that is concurrently processed (e.g., Spivey-Knowlton et al. 2002); this integra- tion requires visual attention and sentence processing to be synchronized temporally (e.g., Zelinsky and Murphy 2000). Language comprehension and language production, however, differ in their temporal interaction with visual attention. In a comprehension task, visual attention is guided by linguistic information, and its main role is to anticipate which objects the speech could refer to next (e.g., Altmann and Kamide 1999). In a production task, instead, visual attention plays an active role in deciding which objects in the scene should be mentioned in a sentence (e.g., Griffin and Bock 2000). The relation between comprehension and production has been investigated mainly in the context of dialogue. A promi- nent account of how comprehension and production relate to each other is the interactive alignment model (Pickering and Garrod, 2007); which assumes that successful dialogue leads to aligned representations at every linguistic level, and that this alignment is supported by priming, i.e., the reuse of lin- guistic material. Importantly, this process of alignment in dialogue has been observed to go beyond aligned linguistic representations; it also includes the gaze coordination of dialogue partners. Richardson et al. (2007) showed that the scan patterns of lis- teners and speakers engaged in a dialogue about six charac- ters are coordinated. This coordination is subject to a char- acteristic temporal lag, with the same character being fixated consistently later by listeners than by speakers. This confirms that visual responses during comprehension are launched af- ter the linguistic material is understood; whereas in produc- tion, visual responses are launched prior or during sentence generation. These results strongly suggest the existence of alignment mechanisms that underlie the coordination of comprehension and production processes. However, especially with respect to the evidence for gaze coordination, it is unclear what the role og visual and linguistic information is, and whether the characteristic lag underlying gaze coordination depends on such information. In Richardson et al. (2007), in fact, the visual informa- tion available to the participants is not naturalistically situated (i.e., six portait pictures of characters from TV serials), and the linguistic information used by the speaker to guide the listener, besides referring to a depicted character, does not ac- tively interact with it. As a result of this, the gaze coordination obtained in the dialogue is achieved through a shallow pro- cess of character identification: the speaker is talking about X and the listener looks at X with a constant delay. In this paper, we present a study in which we explicitly inves- tigate how linguistic and visual information interact to pro- duce coordinated scan patterns. We explore coordination at different levels of granularity, from the macro-level of the whole trial down to the level of individual objects. Moreover, we test how coordination is influenced by the visual and lin- guistic referential information shared in comprehension and production, focusing on the animacy of the subject of the sen- tence, shown to influence both linguistic and visual responses (Coco and Keller, 2010), and the number of targets (visual referents associated with the subject). Our main hypothesis is that the characteristic lag under- lying the scan pattern coordination between comprehension and production emerges only when sentence processing is ac- tively involved, and that it is directly influenced by the proper- ties of the visual and linguistic information being processed. In particular, scan patterns are expected to show less coordi- nation on a single animate target, as the associated informa- tion spans a wider range of contextual possibilities. In con- trast, the low linguistic relevance of an inanimate target, and the referential ambiguity of multiple targets should force par- ticipants to depend more strongly on contextual scene infor- mation, thus triggering a higher degree of coordination. Experiment Our study aims to explore the role of referential factors in the temporal dynamics of scan pattern coordination between lan- guage comprehension and production during the description
[1]
G. Altmann,et al.
Incremental interpretation at verbs: restricting the domain of subsequent reference
,
1999,
Cognition.
[2]
N. Marwan,et al.
Nonlinear analysis of bivariate data with cross recurrence plots
,
2002,
physics/0201061.
[3]
Daniel C. Richardson,et al.
Nominal Cross Recurrence as a Generalized Lag Sequential Analysis for Behavioral Streams
,
2011,
Int. J. Bifurc. Chaos.
[4]
Daniel C. Richardson,et al.
The Art of Conversation Is Coordination
,
2007,
Psychological science.
[5]
R. Baayen,et al.
Mixed-effects modeling with crossed random effects for subjects and items
,
2008
.
[6]
Gregory J. Zelinsky,et al.
Synchronizing Visual and Language Processing: An Effect of Object Name Length on Eye Movements
,
2000,
Psychological science.
[7]
Moreno I. Coco,et al.
Sentence Production in Naturalistic Scenes with Referential Ambiguity
,
2010
.
[8]
Yuanzhen Li,et al.
Measuring visual clutter.
,
2007,
Journal of vision.
[9]
Zenzi M. Griffin,et al.
PSYCHOLOGICAL SCIENCE Research Article WHAT THE EYES SAY ABOUT SPEAKING
,
2022
.
[10]
M. Pickering,et al.
Do people use language production to make predictions during comprehension?
,
2007,
Trends in Cognitive Sciences.