Referring and Gaze Alignment: Accessibility is Alive and Well in Situated Dialogue Ellen Gurman Bard (ellen@ling.ed.ac.uk) Linguistics and English Language, University of Edinburgh, Edinburgh EH8 9LL, UK Robin Hill (r.l.hill@ed.ac.uk) Human Communication Research Centre, University of Edinburgh, Edinburgh EH8 9LW, UK Manabu Arai (manabu.arai@ed.ac.uk) Human Communication Research Centre, University of Edinburgh, Edinburgh EH8 9LW, UK Abstract Accessibility theory (Ariel, 1988; Gundel, Hedberg, & Zacharski, 1993) proposes that the grammatical form of a referring expression depends on the accessibility of its referent, with greater accessibility permitting more reduced expressions. From whose perspective is accessibility measured? Recent experiments (Bard, Hill, & Foster, 2008; de Ruiter & Lamers, submitted) using a joint construction task suggest that the speaker’s view often determines referential form. Two objections to these results would neutralize accessibility predictions in many real-world situations. First, objects in shared visual space may be so salient that all will be highly accessible and reference to them in whatever form cannot fail (Smith, Noda, Andrews, & Jucker, 2005). Second, since joint action demands joint attention, the listener’s and speaker’s view of what is accessible should seldom differ. We use cross-recurrence analysis of interlocutors’ gaze to show that neither objection applies. Gaze is not always well aligned. Dyads whose referring expressions ignored listeners’ needs did not coordinate attention well. Dyads referring cooperatively coordinated attention better and in a way linked to the elaboration of their referring expressions. Keywords: reference, accessibility, corpus experimental studies, pragmatics, situated dialogue Introduction The question “How shall a thing be called?” (Brown, 1958) still engages anyone who deals with human or machine language production. One very wide-ranging approach (Ariel, 1988, 1990, 2001) attempts to key elaboration of the form of referring expressions to how difficult the producer of the expression estimates it will be to access the referent concept, discourse entity, or extra-linguistic object. Expressions introducing entities deemed completely unfamiliar to the audience should be maximally detailed, as in, for example indefinite NPs including modifiers of various kinds (‘a former Republican senator from strongly democratic Massachusetts’). Expressions of intermediate accessibility might be definite NPs, deictic expressions, or personal pronouns in that order. Expressions making reference to a single most immediately mentioned entity in focus can be as minimal as so-called clitics, unstressed and all but deleted pronouns (‘/z/ in the garage’). Accessibility theory offers a unified framework for predicting how forms of referring expressions will respond to givenness, discourse focus, and inferrability from local scenarios. Accessibility ought to include effects of any conditions which might draw attention to the correct referent. Our research asks whose attention it is that determines referential form, and whether, in situations where both a speaker and a listener are present, there is any point in attempting to distinguish between them. Ariel’s (2001) notion of accessibility depends on what the speaker supposes is the case, not on what is genuinely easier for the listener. Opinions differ on how firmly speakers’ suppositions are based on evidence about listeners’ genuine states, both in design of referring expressions and in other aspects of behaviour. While accessibility of referring expressions was more sensitive to the knowledge of the listener than was clarity of articulation (Bard & Aylett, 2004), speakers’ tendencies to match nomenclature to listeners’ history or current situation are quite variable (Brennan & Clark, 1996; Horton & Gerrig, 2002, 2005a; Horton & Keysar, 1996; Keysar, Lin, & Barr, 2003). Though speakers may construct careful models of their interlocutors (Brennan & Clark, 1996), they may be unwilling or unable to recall, or deploy any such model in a timely fashion (Bard et al., 2000; Bard & Aylett, 2004; Horton & Gerrig, 2002, 2005a, 2005b; Horton & Keysar, 1996). It may be much easier to adopt a global account of a situation rather than constructing an incremental evidence- contingent plan: for example, when speakers can see the eye track of their interlocutors during a shared task, their search patterns may differ from those they follow without this cue (Bard et al., 2007; Brennan, Chen, Dickinson, Neider, & Zelinsky, 2007), but when the listener’s eye track indicates an error, they may fail to make individually contingent responses (Bard et al., 2007). Two recent experiments have explored factors that make speakers more or less sensitive to their listeners’ knowledge. Both studies used a joint physical and visual task which makes it possible to vary participants’ knowledge and responsibilities. Figure 1 illustrates the task. In the Joint Construction Task two players cooperate to construct a tangram in a shared workspace represented on their yoked screens. Each trial offers a new target tangram using a non-exhaustive selection from the same set of coloured geometric shapes. Each player can manipulate the component shapes or partly built tangrams by mouse actions, but two parts can be joined together only if they are
[1]
Matthew P. Aylett,et al.
Referential form, word duration, and modelling the listener in spoken dialogue
,
2004
.
[2]
Mira Ariel.
Referring and accessibility
,
1988,
Journal of Linguistics.
[3]
Andreas H. Jucker,et al.
Setting the stage: How speakers prepare listeners for the introduction of referents in dialogues and monologues
,
2005
.
[4]
Julie C. Sedivy,et al.
Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning
,
1995
.
[5]
Jeanette K. Gundel,et al.
Cognitive Status and the form of Referring Expressions in Discourse
,
1993,
The Oxford Handbook of Reference.
[6]
Mira Ariel.
Accessibility theory: An overview
,
2001
.
[7]
M. Tanenhaus,et al.
Approaches to studying world-situated language use : bridging the language-as-product and language-as-action traditions
,
2005
.
[8]
Wilbert Spooren,et al.
Text representation : linguistic and psycholinguistic aspects
,
2001
.
[9]
A. Giuliani,et al.
Detecting deterministic signals in exceptionally noisy environments using cross-recurrence quantification
,
1998
.
[10]
Ipke Wachsmuth,et al.
Deictic object reference in task-oriented dialogue
,
2006
.
[11]
Yuki Kamide,et al.
Now you see it, now you don't: mediating the mapping between language and the visual world
,
2004
.
[12]
Yiya Chen,et al.
Let’s you do that: Sharing the cognitive burdens of dialogue
,
2007
.
[13]
Daniel C. Richardson,et al.
Looking To Understand: The Coupling Between Speakers' and Listeners' Eye Movements and Its Relationship to Discourse Comprehension
,
2005,
Cogn. Sci..
[14]
B. Keysar,et al.
When do speakers take into account common ground?
,
1996,
Cognition.
[15]
E. Bard,et al.
Controlling the Intelligibility of Referring Expressions in Dialogue
,
2000
.
[16]
Daniel C. Richardson,et al.
The Art of Conversation Is Coordination
,
2007,
Psychological science.
[17]
R. Gerrig,et al.
Speakers’ experiences and audience design: knowing when and knowing how to adjust utterances to addressees☆
,
2002
.
[18]
R. Gerrig,et al.
The impact of memory demands on audience design during language production
,
2005,
Cognition.
[19]
H. H. Clark,et al.
Conceptual pacts and lexical choice in conversation.
,
1996,
Journal of experimental psychology. Learning, memory, and cognition.
[20]
A. Meyer,et al.
Eye movements during speech planning: Talking about present and remembered objects
,
2004
.
[21]
D. Barr,et al.
Limits on theory of mind use in adults
,
2003,
Cognition.
[22]
Alois Knoll,et al.
The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue
,
2008,
2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).
[23]
R. Brown.
How shall a thing be called.
,
1958,
Psychological review.
[24]
Mira Ariel.
Accessing Noun-Phrase Antecedents
,
1990
.
[25]
Z. Griffin.
Why Look? Reasons for Eye Movements Related to Language Production.
,
2004
.
[26]
Christopher A. Dickinson,et al.
Coordinating cognition: The costs and benefits of shared gaze during collaborative search
,
2008,
Cognition.
[27]
Robin L. Hill,et al.
Who tunes accessibility of referring expressions in task-related dialogue?
,
2008
.
[28]
Jean Carletta,et al.
Eyetracking for two-person tasks with manipulation of a virtual world
,
2010,
Behavior research methods.
[29]
William S. Horton,et al.
Conversational Common Ground and Memory Processes in Language Production
,
2005
.