Referential form, word duration, and modelling the listener in spoken dialogue

Referential Form, Word Duration, and Modeling the Listener in Spoken Dialogue E. G. Bard (ellen@ling.ed.ac.uk) HCRC and Department of Theoretical and Applied Linguistics, Adam Ferguson Building, University of Edinburgh Edinburgh EH8 9LL, U.K M. P. Aylett (matthewa@cstr.ed.ac.uk) HCRC and Rhetorical Systems, 2 Buccleuch Place, University of Edinburgh, Edinburgh EH8 9LW, U.K Abstract Referring expressions are thought to be tailored to the needs of the listener, even when those needs might be costly to assess, but tests of this claim seldom manipulate listener’s and speaker’s knowledge independently. The design of the HCRC Map Task enables us to do so. We examine two ‘tailoring’ changes in repeated mentions of landmark names: faster articulation and simplified referring expressions. Articulation results replicate Bard et al. (2000), depending only on what the speaker has heard. Change between mentions was no greater when it could be inferred that the listener could see the named item (Expt 1), and no less when the listener explicitly denied ability to do so (Expt 2). Word duration fell for speaker- Given listener-New items (Expt 3). Reduction was unaffected by the repeater’s ability to see the mentioned landmark (Expt 4). In contrast, referential form was more sensitive to both listener- (Expt 3) and speaker- knowledge (Expt 4). The results conform most closely to a Dual Process model: fast, automatic, processes let the speaker-knowledge prime word articulation, while costly assessments of listener-knowledge influence only referential form. Introduction Speakers are said to design their utterances to suit the needs of their listeners, insofar as those needs can be known (Ariel, 1990; Clark & Marshall, 1981; Gundel, Hedberg, & Zacharski, 1993; Lindblom, 1990). Certainly, there is variation in form. Clarity of pronunciation varies with predictability from local context (Hunnicutt, 1985; Lieberman, 1963) and with repeated mention (Fowler & Housum, 1987). Referential forms are syntactically simpler the more readily interpreted or ‘accessible’ their antecedents, are (a blacksmith’s cottage v it) (Ariel, 1990, Fowler, Levy, & Brown, 1997; Gundel, et al., 1993; Vonk, Hustinx, & Simmons, 1992). Yet maintaining an incrementally updated model of what the listener knows, what is established common ground, and what the listener needs to know is a considerable cognitive task. Because speaker’s and listener’s knowledge overlap and because it may be impossible to assess the latter accurately, speakers may default to an account of their own knowledge as a proxy for the listener’s (Clark & Marshall, 1981). In fact, many studies simply assume that the two are the same: they manipulate the speaker’s knowledge without independently manipulating the listener’s (see Keysar, 1997). This paper compares two versions of the hypothesis that referring expressions are genuinely tailored to the addressee. One deals with the articulation of individual words, the other with the syntactic form of referring expressions. Under current models of language production, NP structure and articulation are generated within units of different sizes, intonational or syntactic phrases on the one hand and phonological words, lexical words, or syllables on the other (Levelt & Wheeldon, 1994; Smith & Wheeldon, 1999; Wheeldon & Lahiri, 1997). Moreover, speech appears to be produced in a cascade, with a sequence of smaller units being prepared for articulation even as the succeeding larger unit is being designed. Thus, incrementally updating a listener model in order to articulate each phonological word appropriately would impose a much heavier computational burden than updating it phrase by phrase. Making both kinds of update for the processes running in parallel would be even more demanding, with the listener model operating both in the state appropriate to the most recently produced word and in the state created by the most recently planned phrase. We will first develop existing hypotheses about how speakers model listeners while planning and producing speech. Then we will report four studies which test these hypotheses on materials from a single corpus. They follow the comparisons made by Bard et al. (2000) on a psychological measure of clarity, the intelligibility to naive listeners (recognition rate) of a balanced sample of excised spoken words. The present paper reports a phonetic measure of clarity (word duration), and a syntactic measure of referential form for all suitable cases in a dialogue corpus. Finally, we will discuss the implications of the comparison.

[1]  Jeanette K. Gundel,et al.  Cognitive Status and the form of Referring Expressions in Discourse , 1993, The Oxford Handbook of Reference.

[2]  C. Fowler,et al.  Talkers' signaling of new and old. words in speech and listeners' perception and use of the distinction , 1987 .

[3]  Alan S. Brown,et al.  Persistent repetition priming in picture naming and its dissociation from recognition memory. , 1988, Journal of experimental psychology. Learning, memory, and cognition.

[4]  E. Bard,et al.  Controlling the Intelligibility of Referring Expressions in Dialogue , 2000 .

[5]  B. Webber,et al.  Elements of Discourse Understanding , 1983 .

[6]  Arthur G. Samuel,et al.  Articulation Quality Is Inversely Related to Redundancy When Children or Adults Have Verbal Control , 1998 .

[7]  Mira Ariel Accessing Noun-Phrase Antecedents , 1990 .

[8]  Stephen Isard,et al.  Segment durations in a syllable frame , 1991 .

[9]  Julie E. Boland,et al.  Priming in pronunciation: Beyond pattern recognition and onset latency , 1989 .

[10]  Boaz Keysar,et al.  Unconfounding common ground , 1997 .

[11]  E. Bard,et al.  The unintelligibility of speech to children , 1983, Journal of Child Language.

[12]  H. H. Clark,et al.  References in Conversation Between Experts and Novices , 1987 .

[13]  P. Lieberman Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech , 1963 .

[14]  Carol A. Fowler,et al.  Reductions of Spoken Words in Certain Discourse Contexts , 1997 .

[15]  Susan R. Fussell,et al.  Coordination of knowledge in communication: effects of speakers' assumptions about what others know. , 1992, Journal of personality and social psychology.

[16]  Ellen F. Prince,et al.  Toward a taxonomy of given-new information , 1981 .

[17]  A. Marchal,et al.  Speech production and speech modelling , 1990 .

[18]  G. Dell,et al.  Adapting production to comprehension: The explicit mention of instruments , 1987, Cognitive Psychology.

[19]  A. Lahiri,et al.  Prosodic Units in Speech Production , 1997 .

[20]  Mark C. Smith,et al.  High level processing scope in spoken sentence production , 1999, Cognition.

[21]  W. Levelt,et al.  Do speakers have access to a mental syllabary? , 1994, Cognition.

[22]  C. Fowler Differential Shortening of Repeated Content Words Produced in Various Communicative Contexts , 1988, Language and speech.

[23]  Sharon Hunnicutt,et al.  Intelligibility Versus Redundancy - Conditions of Dependency , 1985 .

[24]  B. Keysar,et al.  When do speakers take into account common ground? , 1996, Cognition.

[25]  E. Bard,et al.  The unintelligibility of speech to children: effects of referent availability , 1994, Journal of Child Language.

[26]  H. H. Clark,et al.  Conceptual pacts and lexical choice in conversation. , 1996, Journal of experimental psychology. Learning, memory, and cognition.

[27]  Wietske Vonk,et al.  The use of referential expressions in structuring discourse , 1992 .