Linguistic Adaptations in Spoken Human-Computer Dialogues Empirical Studies of User Behavior

This thesis addresses the question of how speakers adapttheir language when they interact with a spoken dialoguesystem. In human–human dialogue, people continuously adaptto their conversational partners at different levels. Wheninteracting with computers, speakers also to some extent adapttheir language to meet (what they believe to be) theconstraints of the dialogue system. Furthermore, if a problemoccurs in the human–computer dialogue, patterns oflinguistic adaptation are often accentuated.In this thesis, we used an empirical approach in which aseries of corpora of human–computer interaction werecollected and analyzed. The systems used for data collectionincluded both fully functional stand-alone systems in publicsettings, and simulated systems in controlled laboratoryenvironments. All of the systems featured animated talkingagents, and encouraged users to interact using unrestrictedspontaneous language. Linguistic adaptation in the corpora wasexamined at the phonetic, prosodic, lexical, syntactic andpragmatic levels.Knowledge about users’linguistic adaptations can beuseful in the development of spoken dialogue systems. If we areable to adequately describe their patterns of occurrence (atthe different linguistic levels at which they occur), we willbe able to build more precise user models, thus improvingsystem performance. Our knowledge of linguistic adaptations canbe useful in at least two ways: first, it has been shown thatlinguistic adaptations can be used to identify (andsubsequently repair) errors in human–computer dialogue.Second, we can try to subtly influence users to behave in acertain way, for instance by implicitly encouraging a speakingstyle that improves speech recognition performance.

[1]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[2]  Graeme Hirst,et al.  Collaborating on Referring Expressions , 1991, CL.

[3]  Shrikanth S. Narayanan,et al.  Automatic speech recognition for children , 1997, EUROSPEECH.

[4]  Gökhan Tür,et al.  Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[5]  Dick R. van Bergem,et al.  Acoustic vowel reduction as a function of sentence accent, word stress, and word class , 1993, Speech Commun..

[6]  Sharon Oviatt,et al.  Designing and evaluating conversational interfaces with animated characters , 2001 .

[7]  Susan J. Boyce,et al.  Spoken Natural Language Dialogue Systems: User Interface Issues for the Future , 1999 .

[8]  Dennis E. Egan,et al.  Handbook of Human Computer Interaction , 1988 .

[9]  Wayne S. Murray,et al.  Dialogue with machines , 1988, Cognition.

[10]  Robert Eklund A Comparative Study of Disfluencies in Four Swedish Travel Dialogue Corpora , 1999 .

[11]  Donald Hindle,et al.  Deterministic Parsing of Syntactic Non-fluencies , 1983, ACL.

[12]  Björn Lindblom,et al.  Explaining Phonetic Variation: A Sketch of the H&H Theory , 1990 .

[13]  Shrikanth S. Narayanan,et al.  Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[14]  Nicole Yankelovich,et al.  How do users know what to say? , 1996, INTR.

[15]  Emiel Krahmer,et al.  The dual of denial: Two uses of disconfirmations in dialogue and their prosodic correlates , 2002, Speech Commun..

[16]  H. Branigan,et al.  Non-linguistic influences on rates of disfluency in spontaneous speech , 1999 .

[17]  木村 和夫 Pragmatics , 1997, Language Teaching.

[18]  David R. Traum,et al.  Book Reviews: Spoken Natural Language Dialogue Systems: A Practical Approach , 1996, CL.

[19]  David R Traum,et al.  Towards a Computational Theory of Grounding in Natural Language Conversation , 1991 .

[20]  C H Nakatani,et al.  A corpus-based study of repair cues in spontaneous speech. , 1994, The Journal of the Acoustical Society of America.

[21]  Joakim Gustafson,et al.  Repetition and its phonetic realizations : investigating a Swedish database of spontaneous computer directed speech , 1999 .

[22]  Elizabeth Shriberg DISFLUENCIES IN SWITCHBOARD , 1996 .

[23]  Fred Karlsson,et al.  SWETWOL: A Comprehensive Morphological Analyser for Swedish , 1992 .

[24]  Julia Hirschberg,et al.  Prosodic cues to recognition errors , 1999 .

[25]  M. Picheny,et al.  Speaking clearly for the hard of hearing. II: Acoustic characteristics of clear and conversational speech. , 1986, Journal of speech and hearing research.

[26]  Sharon L. Oviatt,et al.  Error resolution during multimodal human-computer interaction , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[28]  A. Samuel,et al.  Whither speech recognition? , 1969, The Journal of the Acoustical Society of America.

[29]  Jim Miller,et al.  Spontaneous Spoken Language: Syntax and Discourse , 1998 .

[30]  J. Cassell,et al.  SOCIAL DIALOGUE WITH EMBODIED CONVERSATIONAL AGENTS , 2005 .

[31]  Joakim Gustafson,et al.  Child and adult speaker adaptation during error resolution in a publicly available spoken dialogue system , 2003, INTERSPEECH.

[32]  Amanda J. Stent,et al.  Dialogue Systems as Conversational Partners: Applying Conversation Acts Theory to Natural Language G , 2001 .

[33]  Louis Goldstein,et al.  Consonant features in speech errors , 1980 .

[34]  Jean-Luc Gauvain,et al.  User evaluation of the MASK kiosk , 1998, Speech Commun..

[35]  J. Beskow Talking Heads - Models and Applications for Multimodal Speech Synthesis , 2003 .

[36]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[37]  Kåre Sjölander,et al.  An HMM-based system for automatic segmentation and alignment of speech , 2003 .

[38]  Elizabeth Shriberg,et al.  Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction , 1992, HLT.

[39]  Joakim Gustafson,et al.  How do system questions influence lexical choices in user answers? , 1997, EUROSPEECH.

[40]  Elisabeth Schriberg,et al.  Preliminaries to a Theory of Speech Disfluencies , 1994 .

[41]  David R. Traum,et al.  Utterance Units in Spoken Dialogue , 1996, ECAI Workshop on Dialogue Processing in Spoken Language Systems.

[42]  Arne Jönsson,et al.  Dialogue management for natural language interfaces - an empirical approach , 1993, Linköping studies in science and technology dissertations.

[43]  J. Searle What is a Speech Act , 1996 .

[44]  Martin Helander,et al.  Design Issues for Interfaces using Voice Input , 1997 .

[45]  Alice Carlberger Nparse - a shallow n-gram-based grammatical-phrase parser , 1999, EUROSPEECH.

[46]  Diane Horton,et al.  Repairing conversational misunderstandings and non-understandings , 1994, Speech Communication.

[47]  Peter A. Heeman,et al.  Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialog , 1997, ArXiv.

[48]  William A. Ainsworth,et al.  Feedback Strategies for Error Correction in Speech Recognition Systems , 1992, Int. J. Man Mach. Stud..

[49]  Jens Allwood,et al.  An activity-based approach to pragmatics , 2000, Abduction, Belief and Context in Dialogue.

[50]  Nigel Gilbert,et al.  Simulating speech systems , 1991 .

[51]  Hynek Hermansky,et al.  Towards increasing speech recognition error rates , 1995, Speech Commun..

[52]  E. Schegloff,et al.  Opening up Closings , 1973 .

[53]  Joakim Gustafson,et al.  Developing Multimodal Spoken Dialogue Systems : Empirical Studies of Spoken Human-Computer Interaction , 2002 .

[54]  J. Searle Expression and Meaning: A taxonomy of illocutionary acts , 1975 .

[55]  Susan Brennan,et al.  Interaction and feedback in a spoken language system: a theoretical framework , 1995, Knowl. Based Syst..

[56]  Sharon L. Oviatt,et al.  Predicting hyperarticulate speech during human-computer error resolution , 1998, Speech Commun..

[57]  Siobhan Chapman Logic and Conversation , 2005 .

[58]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[59]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[60]  Mei-Yuh Hwang,et al.  Can continuous speech recognizers handle isolated speech? , 1997, Speech Commun..

[61]  N I Durlach,et al.  Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. , 1985, Journal of speech and hearing research.

[62]  Steven Greenberg,et al.  Recognition in a new key-towards a science of spoken language , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[63]  Jonas Beskow,et al.  Developing a 3D-agent for the august dialogue system , 1999, AVSP.

[64]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[65]  Alexandra A. Cleland,et al.  Syntactic co-ordination in dialogue , 2000, Cognition.

[66]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[67]  Justine Cassell,et al.  Relational agents: a model and implementation of building user trust , 2001, CHI.

[68]  James C. Lester,et al.  The persona effect: affective impact of animated pedagogical agents , 1997, CHI.

[69]  Jens Edlund,et al.  Adapt - a multimodal conversational dialogue system in an apartment domain , 2000, INTERSPEECH.

[70]  Julia Hirschberg,et al.  A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues , 1996, ACL.

[71]  Sharon L. Oviatt,et al.  Adaptation of users² spoken dialogue patterns in a conversational interface , 2002, INTERSPEECH.

[72]  W. Smith,et al.  Pragmatic Issues in Handling Miscommunication : Observations of aSpoken Natural Language Dialog , 1996 .

[73]  Jens Allwood,et al.  Communicative Activity Analysis of a Wizard of Oz Experiment , 2000 .

[74]  Andreas Stolcke,et al.  Automatic linguistic segmentation of conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[75]  Sheri Hunnicutt,et al.  Spoken dialogue data collected in the Waxholm project , 1995 .

[76]  Joakim Gustafson,et al.  Positive and negative user feedback in a spoken dialogue corpus , 2000, INTERSPEECH.

[77]  Joakim Gustafson,et al.  Interaction with an animated agent in a spoken dialogue system , 1999, EUROSPEECH.

[78]  Ben Shneiderman,et al.  The limits of speech recognition , 2000, CACM.

[79]  Mikio Nakano,et al.  Understanding Unsegmented User Utterances in Real-Time Spoken Dialogue Systems , 1999, ACL.

[80]  Mauro Cettolo,et al.  Automatic detection of semantic boundaries based on acoustic and lexical knowledge , 1998, ICSLP.

[81]  Eric K. Ringger,et al.  Robust Understanding in a Dialogue System , 1996 .

[82]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[83]  Alexander I. Rudnicky,et al.  Multi-Site Data Collection and Evaluation in Spoken Language Understanding , 1993, HLT.

[84]  C. Nass,et al.  Machines and Mindlessness , 2000 .

[85]  Marilyn A. Walker,et al.  AUTOMATIC PREDICTION OF PROBLEMATIC HUMAN-COMPUTER DIALO GUES IN 'HOW MAY I HELP YOU? , 1999 .

[86]  Sandra A. Thompson,et al.  The predictability of informal conversation , 1990 .

[87]  Sheri Hunnicutt,et al.  A multi-language text-to-speech module , 1982, ICASSP.

[88]  Linda Bell,et al.  Modality Convergence in a Multimodal Dialogue System , 2000 .

[89]  Elizabeth Zoltan-Ford,et al.  How to Get People to Say and Type What Computers Can Understand , 1991, Int. J. Man Mach. Stud..

[90]  Susan E. Brennan,et al.  LEXICAL ENTRAINMENT IN SPONTANEOUS DIALOG , 1996 .

[91]  Alexander I. Rudnicky,et al.  A Comparison of Speech and Typed Input , 1990, HLT.

[92]  P. Lieberman Some Effects of Semantic and Grammatical Context on the Production and Perception of Speech , 1963 .

[93]  Joakim Nivre,et al.  On the Semantics and Pragmatics of Linguistic Feedback , 1992, J. Semant..

[94]  Emiel Krahmer,et al.  Problem spotting in human-machine interaction , 1999, EUROSPEECH.

[95]  Yasuhiro Katagiri,et al.  On different functions of repetitive utterances , 1998, ICSLP.

[96]  S Oviatt,et al.  Modeling global and focal hyperarticulation during human-computer error resolution. , 1998, The Journal of the Acoustical Society of America.

[97]  Ben Shneiderman,et al.  Speech versus Mouse Commands for Word Processing: An Empirical Evaluation , 1993, Int. J. Man Mach. Stud..

[98]  Joakim Gustafson,et al.  Speech technology on trial: Experiences from the August system , 2000, Natural Language Engineering.

[99]  Philip R. Cohen,et al.  Referring as a Collaborative Process , 2003 .

[100]  H. Gross Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand , 1983 .

[101]  Maxine Eskénazi,et al.  Data collection and processing in the carnegie mellon communicator , 1999, EUROSPEECH.

[102]  Shrikanth S. Narayanan,et al.  Politeness and frustration language in child-machine interactions , 2001, INTERSPEECH.

[103]  Colin W. Wightman,et al.  Segmental durations in the vicinity of prosodic phrase boundaries. , 1992, The Journal of the Acoustical Society of America.

[104]  Ronan G. Reilly Communication failure in dialogue and discourse: detection and repair processes , 1986 .

[105]  John Karat,et al.  Speech User Interface Evolution , 1999 .

[106]  S. Garrod,et al.  Saying what you mean in dialogue: A study in conceptual and semantic co-ordination , 1987, Cognition.

[107]  Joakim Gustafson,et al.  A comparison of disfluency distribution in a unimodal and a multimodal speech interface , 2000, INTERSPEECH.

[108]  Joakim Gustafson,et al.  The august spoken dialogue system , 1999, EUROSPEECH.

[109]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[110]  Mattias Heldner,et al.  Prosodic adaptation in human-computer interaction , 2003 .

[111]  S E Levinson Speech recognition technology: a critique. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[112]  Stephanie Seneff,et al.  Intelligent barge-in in conversational systems , 2000, INTERSPEECH.

[113]  Wizards and social control , 1995 .

[114]  Susan J. Boyce,et al.  Human Factors in Human-Computer System Design , 1993, Adv. Comput..

[115]  H. H. Clark,et al.  Conceptual pacts and lexical choice in conversation. , 1996, Journal of experimental psychology. Learning, memory, and cognition.

[116]  S Oviatt,et al.  Linguistic Adaptations During Spoken and Multimodal Error Resolution , 1998, Language and speech.

[117]  Jay G. Wilpon,et al.  A study of speech recognition for children and the elderly , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[118]  Jean-Luc Gauvain,et al.  The LIMSI RailTel System: Field trial of a telephone service for rail travel information , 1997, Speech Commun..

[119]  Carla H. Lagorio,et al.  Psychology , 1929, Nature.

[120]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[121]  Linda Bell,et al.  Linguistic adaptations in spoken and multimodal dialogue systems , 2000 .

[122]  Stephen J. Westerman,et al.  Individual differences in human-computer interaction , 1993 .

[123]  Gina-Anne Levow,et al.  Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[124]  Thomas K. Landauer,et al.  Behavioral Research Methods in Human-Computer Interaction , 1997 .

[125]  Sharon L. Oviatt,et al.  Predicting spoken disfluencies during human-computer interaction , 1995, Comput. Speech Lang..

[126]  Joakim Gustafson,et al.  Voice transformations for improving children²s speech recognition in a publicly available dialogue system , 2002, INTERSPEECH.

[127]  P. Dillenbourg,et al.  Miscommunication in Multi-modal Collaboration , 1995 .

[128]  John D. Gould,et al.  Composing letters with a simulated listening typewriter , 1982, CHI '82.

[129]  Joseph Polifroni,et al.  Analysis of the effectiveness of system error messages in a human-machine travel planning task , 1992, ICSLP.

[130]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[131]  Emiel Krahmer,et al.  Error Detection in Spoken Human-Machine Interaction , 2001, Int. J. Speech Technol..

[132]  Alexander H. Waibel,et al.  Exploiting repair context in interactive error recovery , 1997, EUROSPEECH.

[133]  Jens Edlund,et al.  Constraint Manipulation and Visualization in a Multimodal Dialogue System , 2002 .

[134]  Nikko Ström Automatic Continuous Speech Recognition with Rapid Speaker Adaptation for Human/machine Interaction , 1997 .

[135]  Gina-Anne Levow,et al.  Characterizing and Recognizing Spoken Corrections in Human-Computer Dialogue , 1998, ACL.

[136]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[137]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[138]  James F. Allen,et al.  An architecture for more realistic conversational systems , 2001, IUI '01.

[139]  Alphonse Chapanis,et al.  Interactive Human Communication: Some Lessons Learned from Laboratory Experiments. , 1976 .

[140]  C. Habel,et al.  Language , 1931, NeuroImage.

[141]  Sharon L. Oviatt,et al.  Modeling hyperarticulate speech during human-computer error resolution , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[142]  Niels Ole Bernsen,et al.  Wizard-of-oz and the trade-off between naturalness and recogniser constraints , 1993, EUROSPEECH.

[143]  G. Beattie Talk: An Analysis of Speech and Non-Verbal Behaviour in Conversation , 1985 .

[144]  Ulla Sundberg,et al.  Mother tongue - Phonetic Aspects of Infant-Directed Speech , 1998 .

[145]  Johan Boye,et al.  Real-time Handling of Fragmented Utterances , 2001 .

[146]  H. Giles,et al.  Speech Accommodation Theory: The First Decade and Beyond , 1987 .

[147]  Herbert H. Clark,et al.  Managing problems in speaking , 1994, Speech Communication.

[148]  Nikko Strom,et al.  The Waxholm system - a progress report , 2002 .

[149]  Arne Jönsson,et al.  Talking to a Computer Is Not like Talking to Your Best Friend , 1988, SCAI.

[150]  Giuseppe Riccardi,et al.  How may I help you? , 1997, Speech Commun..

[151]  Raymonde Guindon,et al.  Grammatical and Ungrammatical Structures in User-Adviser Dialogues= Evidence for Sufficiency of Restricted Languages in Natural Language Interfaces to Advisory Systems , 1987, ACL.

[152]  Harry Bunt,et al.  Abduction, Belief and Context in Dialogue , 2000, Natural Language Processing.

[153]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[154]  Ronnie W. Smith,et al.  An evaluation of strategies for selectively verifying utterance meanings in spoken natural language dialog , 1998, Int. J. Hum. Comput. Stud..

[155]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[156]  David P. Morgan,et al.  How to build a speech recognition application : a style guide for telephony dialogues , 2001 .

[157]  Jens Edlund,et al.  Turn-taking gestures and hour-glasses in a multi-modal dialogue system , 2002 .

[158]  Stephanie Kelter,et al.  Surface form and memory in question answering , 1982, Cognitive Psychology.

[159]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[160]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[161]  C. Nass,et al.  Voices, boxes, and sources of messages: Computers and social actors. , 1993 .

[162]  James F. Allen,et al.  Speech repains, intonational phrases, and discourse markers: modeling speakers’ utterances in spoken dialogue , 1999, CL.

[163]  R. Quirk,et al.  A Corpus of English Conversation , 1980 .

[164]  N. Dahlbäck,et al.  Representations of discourse : cognitive and computational aspects , 1991 .

[165]  Mike Edgington,et al.  OASIS - a framework for spoken language call steering , 1999, EUROSPEECH.

[166]  Sharon L. Oviatt,et al.  Amplitude convergence in children²s conversational speech with animated personas , 2002, INTERSPEECH.

[167]  Gabriel Skantze,et al.  Coordination of referring expressions in multimodal human-computer dialogue , 2002, INTERSPEECH.

[168]  E. Hawkins Spoken and written language , 1985, Science.