There was a long pause: influencing turn-taking behaviour in human-human and human-computer spoken dialogues

We report an experiment designed to compare human-human spoken dialogues with human-computer spoken dialogues. Our primary purpose was to collect data on protocols that were used to control the interaction. Three groups of subjects (35 total) were each asked to complete tasks over the phone. The experimental procedure was a new variation on the Wizard of Oz simulation technique that allowed much clearer comparisons to be made between human-human and human-computer interactions.Previous studies have shown that there are significant differences between human-human and human-computer interactions. While some effects can be attributed to the beliefs about computers the subjects bring to the task, others appear to be connected with the ongoing interaction styles of the speakers. Our study focuses on effects created by differences in interaction style. An important feature of the study is the use of two wizards, a technique which resulted in a realistically degraded communication channel.A second important feature of this study is the emphasis on computational models of spoken dialogue processing. One of the aims of Wizard of Oz studies is to identify language restrictions that will make the understanding task easier yet still be acceptable to the users. We observed that subjects could indeed successfully carry out their task with a restricted turn-taking protocol. More importantly, however, the experiment pointed us in the direction of a less restricted protocol and provided the data for a more sophisticated computational model of turn-taking.An important aspect of our study is the light it appears to shed on conflicting results in the literature. We discuss how these conflicts can be explained in terms of differences in interlocution style. We argue that ongoing interlocution style has a significant effect on the dialogue and over-rides a priori models of interlocutor ability.

[1]  Lenhart K. Schubert,et al.  The TRAINS Project , 1991 .

[2]  Georgia M. Green Pragmatics and Natural Language Understanding , 1989 .

[3]  Wayne H. Ward,et al.  High level knowledge sources in usable speech recognition systems , 1989, CACM.

[4]  B. H. Thompson,et al.  Linguistic Analysis of Natural Language Communication With Computers , 1980, COLING.

[5]  James F. Allen,et al.  A Plan Recognition Model for Subdialogues in Conversations , 1987, Cogn. Sci..

[6]  J. F. Kelley,et al.  An empirical methodology for writing user-friendly natural language computer applications , 1983, CHI '83.

[7]  Julia Hirschberg,et al.  Disambiguating Cue Phrases in Text and Speech , 1990, COLING.

[8]  Lynn Lambert,et al.  A Tripartite Plan-Based Model of Dialogue , 1991, ACL.

[9]  Anne Johnstone,et al.  The DIM system: WOZ Simulation Results - Phase II , 1992 .

[10]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[11]  Noëlle Carbonell,et al.  User Representations of Computer Systems in Human-Computer Speech Interaction , 1993, Int. J. Man Mach. Stud..

[12]  Nigel Gilbert,et al.  Simulating speech systems , 1991 .

[13]  Raj Reddy,et al.  Automatic Speech Recognition: The Development of the Sphinx Recognition System , 1988 .

[14]  Anne Johnstone,et al.  The DIM system: Turn-Taking in Dyadic Telephone Dialogues , 1993 .

[15]  David R Traum,et al.  Towards a Computational Theory of Grounding in Natural Language Conversation , 1991 .

[16]  D. Sperber,et al.  Relevance: Communication and Cognition , 1997 .