Incremental Effects of Mismatch during Picture-Sentence Integration: Evidence from Eye-tracking Pia Knoeferle (knoeferle@coli.uni-sb.de) Department of Computational Linguistics, Saarland University, 66041 Saarbr¨ ucken, Germany Matthew W. Crocker (crocker@coli.uni-sb.de) Department of Computational Linguistics Saarland University, 66041 Saarbr¨ ucken, Germany that when there is a match between a picture and a sen- tence, their integration should be faster than when a picture and a sentence do not match. Abstract A model of sentence-picture integration developed by Carpenter and Just (1975) predicts that picture- sentence integration ease/difficulty depends on picture- sentence match/mismatch respectively. Recent find- ings by Underwood, Jebbet, and Roberts (2004), how- ever, fail to find a match/mismatch difference for se- rial picture-sentence presentation in a sentence veri- fication study. In a sentence comprehension study with serial picture-sentence presentation we find no match/mismatch effect in total sentence inspection times. However, inspection times for individual sentence regions reveal a mismatch effect at the very sentence constituent for which the corresponding picture con- stituent mismatches, and this in a study with a sentence comprehension rather than verification task. Drawing on insights about spoken sentence comprehension dur- ing the inspection of concurrent scenes, we suggest that the absence of a mismatch effect in the Underwood et al. studies might be due to grain size of gaze time analyses. A Model of Incremental Sentence-Picture Comparison? Introduction How do we integrate what we see in a scene with a sentence that we read? Answering this question is of interest in various types of comprehension situations such as when we read comic books (Carroll, Young, & Guertin, 1992), newspaper advertisements (Rayner, Rotello, Stewart, Keir, & Duffy, 2001), or inspect sci- entific diagrams (Feeney, Hola, Liversedge, Findlay, & Metcalf, 2003). One account of how a picture and sentence are inte- grated is the “Constituent Comparison Model” (CCM) by Carpenter and Just (1975). They suggest that peo- ple build a mental representation of sentence and picture constituents, and that the corresponding constituents of sentence and picture are then serially compared with one another. Their model of sentence verification ac- counts for response latencies in a number of sentence- picture verification studies by attributing differences in the response latencies to congruence/incongruence be- tween sentence and picture (e.g., Gough, 1965, Just & Carpenter, 1971). In a sentence verification task, Just and Carpenter (1971) presented people with a picture of either red or black dots, followed by a related written sentence. Sen- tence verification response latencies were shorter when the colour adjective in the sentence (red ) matched the colour of the depicted dots (red) than when it did not match their colour (black). The CCM predicts precisely The CCM has received strong support from off-line re- sponse latencies in verification tasks, and is primarily a model of sentence-picture verification. The model spec- ifies - at least to some extent - how the integration of picture and a written sentence proceeds incrementally: by serially comparing the representations of sentence and corresponding picture constituents. Reaction times are approriate for testing the complex- ity of sentence-picture integration steps. However, for truly examining the incremental integration of picture- and sentence-based mental representations they are less informative than other, on-line measures such as eye- tracking. In sentence-picture integration research, few studies have monitored eye-movements during sentence reading (e.g., Carroll et al., 1992; Underwood et al., 2004). Among recent studies in the sentence-picture verification paradigm that have employed eye-tracking, findings by Underwood et al. (2004) have challenged the validity of the CCM. They have further identified im- portant additional factors (e.g., order of picture-sentence presentation) that affect their integration. In two eye-tracking studies with a sentence-picture verification task, Underwood et al. (2004) examined the effect of presentation order for real-world photographs and captions. They report total inspection time, num- ber of fixations, and durations of fixations for the en- tire sentence and picture in addition to response la- tencies. In Experiment 1, picture and caption were presented together, and congruence was manipulated (match/mismatch). Results confirmed the established match/mismatch effect: Response latencies were longer for the mismatch than for the match condition. Total in- spection times and number of fixations further confirmed this finding. In Experiment 2, order of presentation (picture-first, sentence-first) was introduced as a condition in addition to the match/mismatch manipulation. Crucially, and in contrast to Experiment 1, there was no match/mismatch effect in Experiment 2 in either response latencies or in- spection times for the entire sentence. Response accu- racy was relatively high (83.6 and 79.2 percent for match
[1]
Julie C. Sedivy,et al.
Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning
,
1995
.
[2]
K. Rayner.
Eye movements in reading and information processing: 20 years of research.
,
1998,
Psychological bulletin.
[3]
Marcel Adam Just,et al.
Sentence comprehension: A psycholinguistic processing model of verification.
,
1975
.
[4]
Julie C. Sedivy,et al.
Achieving incremental semantic interpretation through contextual representation
,
1999,
Cognition.
[5]
G. Altmann,et al.
Incremental interpretation at verbs: restricting the domain of subsequent reference
,
1999,
Cognition.
[6]
P. Carroll,et al.
Visual Analysis of Cartoons: A View from the Far Side
,
1992
.
[7]
John M. Findlay,et al.
How People Extract Information from Graphs: Evidence from a Sentence-Graph Verification Paradigm
,
2000,
Diagrams.
[8]
Philip B. Gough,et al.
Grammatical transformations and speed of understanding
,
1965
.
[9]
Andrew J. Stewart,et al.
Integrating text and pictorial information: eye movements when looking at print advertisements.
,
2001,
Journal of experimental psychology. Applied.
[10]
Pia Knoeferle,et al.
The coordinated processing of scene and utterance: evidence from eye tracking
,
2004
.
[11]
Matthew W. Crocker,et al.
The influence of the immediate visual context on incremental thematic role-assignment: evidence from eye-movements in depicted events
,
2005,
Cognition.
[12]
Geoffrey Underwood,et al.
Inspecting Pictures for Information to Verify a Sentence: Eye Movements in General Encoding and in Focused Search
,
2004,
The Quarterly journal of experimental psychology. A, Human experimental psychology.
[13]
P. Goolkasian,et al.
Picture-word differences in a sentence verification task
,
1996,
Memory & cognition.
[14]
Daniel C. Richardson,et al.
Representation, space and Hollywood Squares: looking at things that aren't there anymore
,
2000,
Cognition.
[15]
Zenzi M. Griffin,et al.
Why Look? Reasons for Eye Movements Related to Language Production.
,
2004
.
[16]
Daniel C. Richardson,et al.
Thinking outside the brain: Spatial indices to visual and linguistic Information.
,
2004
.
[17]
G. Altmann.
Language-mediated eye movements in the absence of a visual world: the ‘blank screen paradigm’
,
2004,
Cognition.