论文信息 - Referring and gaze alignment: accessibility is alive and well in situated dialogue

Referring and gaze alignment: accessibility is alive and well in situated dialogue

Referring and Gaze Alignment: Accessibility is Alive and Well in Situated Dialogue Ellen Gurman Bard (ellen@ling.ed.ac.uk) Linguistics and English Language, University of Edinburgh, Edinburgh EH8 9LL, UK Robin Hill (r.l.hill@ed.ac.uk) Human Communication Research Centre, University of Edinburgh, Edinburgh EH8 9LW, UK Manabu Arai (manabu.arai@ed.ac.uk) Human Communication Research Centre, University of Edinburgh, Edinburgh EH8 9LW, UK Abstract Accessibility theory (Ariel, 1988; Gundel, Hedberg, & Zacharski, 1993) proposes that the grammatical form of a referring expression depends on the accessibility of its referent, with greater accessibility permitting more reduced expressions. From whose perspective is accessibility measured? Recent experiments (Bard, Hill, & Foster, 2008; de Ruiter & Lamers, submitted) using a joint construction task suggest that the speaker’s view often determines referential form. Two objections to these results would neutralize accessibility predictions in many real-world situations. First, objects in shared visual space may be so salient that all will be highly accessible and reference to them in whatever form cannot fail (Smith, Noda, Andrews, & Jucker, 2005). Second, since joint action demands joint attention, the listener’s and speaker’s view of what is accessible should seldom differ. We use cross-recurrence analysis of interlocutors’ gaze to show that neither objection applies. Gaze is not always well aligned. Dyads whose referring expressions ignored listeners’ needs did not coordinate attention well. Dyads referring cooperatively coordinated attention better and in a way linked to the elaboration of their referring expressions. Keywords: reference, accessibility, corpus experimental studies, pragmatics, situated dialogue Introduction The question “How shall a thing be called?” (Brown, 1958) still engages anyone who deals with human or machine language production. One very wide-ranging approach (Ariel, 1988, 1990, 2001) attempts to key elaboration of the form of referring expressions to how difficult the producer of the expression estimates it will be to access the referent concept, discourse entity, or extra-linguistic object. Expressions introducing entities deemed completely unfamiliar to the audience should be maximally detailed, as in, for example indefinite NPs including modifiers of various kinds (‘a former Republican senator from strongly democratic Massachusetts’). Expressions of intermediate accessibility might be definite NPs, deictic expressions, or personal pronouns in that order. Expressions making reference to a single most immediately mentioned entity in focus can be as minimal as so-called clitics, unstressed and all but deleted pronouns (‘/z/ in the garage’). Accessibility theory offers a unified framework for predicting how forms of referring expressions will respond to givenness, discourse focus, and inferrability from local scenarios. Accessibility ought to include effects of any conditions which might draw attention to the correct referent. Our research asks whose attention it is that determines referential form, and whether, in situations where both a speaker and a listener are present, there is any point in attempting to distinguish between them. Ariel’s (2001) notion of accessibility depends on what the speaker supposes is the case, not on what is genuinely easier for the listener. Opinions differ on how firmly speakers’ suppositions are based on evidence about listeners’ genuine states, both in design of referring expressions and in other aspects of behaviour. While accessibility of referring expressions was more sensitive to the knowledge of the listener than was clarity of articulation (Bard & Aylett, 2004), speakers’ tendencies to match nomenclature to listeners’ history or current situation are quite variable (Brennan & Clark, 1996; Horton & Gerrig, 2002, 2005a; Horton & Keysar, 1996; Keysar, Lin, & Barr, 2003). Though speakers may construct careful models of their interlocutors (Brennan & Clark, 1996), they may be unwilling or unable to recall, or deploy any such model in a timely fashion (Bard et al., 2000; Bard & Aylett, 2004; Horton & Gerrig, 2002, 2005a, 2005b; Horton & Keysar, 1996). It may be much easier to adopt a global account of a situation rather than constructing an incremental evidence- contingent plan: for example, when speakers can see the eye track of their interlocutors during a shared task, their search patterns may differ from those they follow without this cue (Bard et al., 2007; Brennan, Chen, Dickinson, Neider, & Zelinsky, 2007), but when the listener’s eye track indicates an error, they may fail to make individually contingent responses (Bard et al., 2007). Two recent experiments have explored factors that make speakers more or less sensitive to their listeners’ knowledge. Both studies used a joint physical and visual task which makes it possible to vary participants’ knowledge and responsibilities. Figure 1 illustrates the task. In the Joint Construction Task two players cooperate to construct a tangram in a shared workspace represented on their yoked screens. Each trial offers a new target tangram using a non-exhaustive selection from the same set of coloured geometric shapes. Each player can manipulate the component shapes or partly built tangrams by mouse actions, but two parts can be joined together only if they are