Does domain size impact speech onset time during reference production? Albert Gatt (albert.gatt@um.edu.mt) Institute of Linguistics, University of Malta Tilburg center for Cognition and Communication (TiCC), Tilburg University Roger P.G. van Gompel (r.p.g.vangompel@dundee.ac.uk) School of Psychology, University of Dundee Emiel Krahmer (e.j.krahmer@uvt.nl) Tilburg Center for Cognition and Communication (TiCC), Tilburg University Kees van Deemter (k.vdeemter@abdn.ac.uk) Department of Computing Science, University of Aberdeen Abstract In referring to a target referent, speakers need to choose a set of properties that jointly distinguish it from its distractors. Current computational models view this as a search process in which the decision to include a prop- erty requires checking how many distractors it excludes. Thus, these models predict that identifying descriptions should take longer to produce the larger the distractor set is, independent of how many properties are required to identify a target. Since every property that is selected is checked, they also predict that distinguishing a tar- get should take longer the more properties are required to distinguish it. This paper tests this prediction em- pirically, contrasting it with two alternative predictions based on models of visual search. Our results provide support for the predictions of computational models, suggesting a crucial difference between the mechanisms underlying reference production and object identifica- tion. Keywords: Referring expressions, language produc- tion, visual search, computational modeling Figure 1: An example domain Introduction When a speaker refers to a target referent in a visual domain, she identifies it for an addressee by using prop- erties which distinguish it from its distractors. For ex- ample, in order to identify the object surrounded by a red border in Figure 1, a speaker needs to refer to it us- ing both its colour and its size (the large blue aeroplane); leaving out either of these properties would result in an underspecified description. Most psycholinguistic accounts of reference in such do- mains assume that the discriminatory value of properties plays an important role, since the objective is to iden- tify an object for the addressee (Olson, 1970). On the other hand, it is also well-established that certain prop- erties are ‘preferred’ in that speakers often include them when they are not required to distinguish the target, thus producing overspecified descriptions (Pechmann, 1989 ; Belke & Meyer, 2002 ; Arts, 2004). The present paper is concerned with the mechanisms underlying the selection of properties. Specifically, we ask whether this process is best viewed as a search, along the lines suggested by current computational models of Referring Expression Generation (reg; see Krahmer & van Deemter, 2012, for a survey). In these models (de- scribed more fully in the next section), the decision to include a property in a description requires checking it against the distractor set to determine whether it ex- cludes at least some of them. If speakers do perform such a procedure, then larger domain sizes should result in more effort (and this should be indicated, for exam- ple, by increased speech onset times). This is because more objects have to be checked every time a property is considered for inclusion. This prediction is compatible with a classic finding in the visual search and attention literature, where search time has been shown to increase linearly with domain size (Treisman & Gelade, 1980). However, whereas reg models predict an impact of domain size irrespective of the number of properties required to distinguish a tar- get referent, the task used by Treisman and Gelade only evinces a linear increase with targets distinguished by a conjunction of properties (e.g. blue and large). When targets are distinguished by a single property, a ‘pop-out’ effect is observed and domain size has no impact. Yet a third possibility is suggested by more recent vi- sual search models (e.g. Itti & Koch, 2001), which give a more central role to parallel processing. In these models,
[1]
Robert Dale,et al.
Viewing Referring Expression Generation as Search
,
2005,
IJCAI.
[2]
F. Ferreira,et al.
Over-specified referring expressions impair comprehension: An ERP study
,
2011,
Brain and Cognition.
[3]
Kees van Deemter.
Generating Referring Expressions that Involve Gradable Properties
,
2006,
CL.
[4]
B. Rossion,et al.
Revisiting Snodgrass and Vanderwart's Object Pictorial Set: The Role of Surface Detail in Basic-Level Object Recognition
,
2004,
Perception.
[5]
Julie C. Sedivy,et al.
Achieving incremental semantic interpretation through contextual representation
,
1999,
Cognition.
[6]
Kenneth I Forster,et al.
DMDX: A Windows display program with millisecond accuracy
,
2003,
Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.
[7]
Robert Dale,et al.
Computational Interpretations of the Gricean Maxims in the Generation of Referring Expressions
,
1995,
Cogn. Sci..
[8]
Julie C. Sedivy,et al.
Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning
,
1995
.
[9]
Athanassios Protopapas,et al.
Check Vocal: A program to facilitate checking the accuracy and response time of vocal responses from DMDX
,
2007,
Behavior research methods.
[10]
A. Meyer,et al.
Tracking the time course of multidimensional stimulus discrimination: Analyses of viewing patterns and processing times during “same”-“different“ decisions
,
2002
.
[11]
D R Olson,et al.
Language and thought: aspects of a cognitive theory of semantics.
,
1970,
Psychological review.
[12]
Emiel Krahmer,et al.
Computational Generation of Referring Expressions: A Survey
,
2012,
CL.
[13]
A. Arts,et al.
Overspecification in instructive texts
,
2004
.
[14]
C. Koch,et al.
Computational modelling of visual attention
,
2001,
Nature Reviews Neuroscience.
[15]
T. Pechmann.
Incremental speech production and referential overspecification
,
1989
.
[16]
Robert Dale,et al.
Cooking Up Referring Expressions
,
1989,
ACL.
[17]
Ehud Reiter,et al.
Book Reviews: Building Natural Language Generation Systems
,
2000,
CL.
[18]
Michael J. Spivey,et al.
Linguistically Mediated Visual Search
,
2001,
Psychological science.
[19]
A. Treisman,et al.
A feature-integration theory of attention
,
1980,
Cognitive Psychology.