论文信息 - Referential form, word duration, and modelling the listener in spoken dialogue

Referential form, word duration, and modelling the listener in spoken dialogue

Referential Form, Word Duration, and Modeling the Listener in Spoken Dialogue E. G. Bard (ellen@ling.ed.ac.uk) HCRC and Department of Theoretical and Applied Linguistics, Adam Ferguson Building, University of Edinburgh Edinburgh EH8 9LL, U.K M. P. Aylett (matthewa@cstr.ed.ac.uk) HCRC and Rhetorical Systems, 2 Buccleuch Place, University of Edinburgh, Edinburgh EH8 9LW, U.K Abstract Referring expressions are thought to be tailored to the needs of the listener, even when those needs might be costly to assess, but tests of this claim seldom manipulate listener’s and speaker’s knowledge independently. The design of the HCRC Map Task enables us to do so. We examine two ‘tailoring’ changes in repeated mentions of landmark names: faster articulation and simplified referring expressions. Articulation results replicate Bard et al. (2000), depending only on what the speaker has heard. Change between mentions was no greater when it could be inferred that the listener could see the named item (Expt 1), and no less when the listener explicitly denied ability to do so (Expt 2). Word duration fell for speaker- Given listener-New items (Expt 3). Reduction was unaffected by the repeater’s ability to see the mentioned landmark (Expt 4). In contrast, referential form was more sensitive to both listener- (Expt 3) and speaker- knowledge (Expt 4). The results conform most closely to a Dual Process model: fast, automatic, processes let the speaker-knowledge prime word articulation, while costly assessments of listener-knowledge influence only referential form. Introduction Speakers are said to design their utterances to suit the needs of their listeners, insofar as those needs can be known (Ariel, 1990; Clark & Marshall, 1981; Gundel, Hedberg, & Zacharski, 1993; Lindblom, 1990). Certainly, there is variation in form. Clarity of pronunciation varies with predictability from local context (Hunnicutt, 1985; Lieberman, 1963) and with repeated mention (Fowler & Housum, 1987). Referential forms are syntactically simpler the more readily interpreted or ‘accessible’ their antecedents, are (a blacksmith’s cottage v it) (Ariel, 1990, Fowler, Levy, & Brown, 1997; Gundel, et al., 1993; Vonk, Hustinx, & Simmons, 1992). Yet maintaining an incrementally updated model of what the listener knows, what is established common ground, and what the listener needs to know is a considerable cognitive task. Because speaker’s and listener’s knowledge overlap and because it may be impossible to assess the latter accurately, speakers may default to an account of their own knowledge as a proxy for the listener’s (Clark & Marshall, 1981). In fact, many studies simply assume that the two are the same: they manipulate the speaker’s knowledge without independently manipulating the listener’s (see Keysar, 1997). This paper compares two versions of the hypothesis that referring expressions are genuinely tailored to the addressee. One deals with the articulation of individual words, the other with the syntactic form of referring expressions. Under current models of language production, NP structure and articulation are generated within units of different sizes, intonational or syntactic phrases on the one hand and phonological words, lexical words, or syllables on the other (Levelt & Wheeldon, 1994; Smith & Wheeldon, 1999; Wheeldon & Lahiri, 1997). Moreover, speech appears to be produced in a cascade, with a sequence of smaller units being prepared for articulation even as the succeeding larger unit is being designed. Thus, incrementally updating a listener model in order to articulate each phonological word appropriately would impose a much heavier computational burden than updating it phrase by phrase. Making both kinds of update for the processes running in parallel would be even more demanding, with the listener model operating both in the state appropriate to the most recently produced word and in the state created by the most recently planned phrase. We will first develop existing hypotheses about how speakers model listeners while planning and producing speech. Then we will report four studies which test these hypotheses on materials from a single corpus. They follow the comparisons made by Bard et al. (2000) on a psychological measure of clarity, the intelligibility to naive listeners (recognition rate) of a balanced sample of excised spoken words. The present paper reports a phonetic measure of clarity (word duration), and a syntactic measure of referential form for all suitable cases in a dialogue corpus. Finally, we will discuss the implications of the comparison.

Matthew P. Aylett | Ellen Gurman Bard | E. Bard | M. Aylett