Constraining User Response via Multimodal Dialog Interface

This paper presents the results of an experiment comparing two different designs of an automated dialog interface. We compare a multimodal design utilizing text displays coordinated with spoken prompts to a voice-only version of the same application. Our results show that the text-coordinated version is more efficient in terms of word recognition and number of out-of-grammar responses, and is equal to the voice-only version in terms of user satisfaction. We argue that this type of multimodal dialog interface effectively constrains user response to allow for better speech recognition without increasing cognitive load or compromising the naturalness of the interaction.

[1]  David G. Novick,et al.  Limiting Factors of Automated Telephone Dialogues , 1999 .

[2]  Ben Shneiderman,et al.  Designing The User Interface , 2013 .

[3]  Marilyn A. Walker,et al.  What can I say?: evaluating a spoken language interface to Email , 1998, CHI.

[4]  Claudio Becchetti,et al.  Speech Recognition: Theory and C++ Implementation , 1999 .

[5]  L. Lamel,et al.  Multi-layer Dialogue Annotation for Automated Multilingual Customer Service , 2003 .

[6]  P. Goolkasian,et al.  Pictures, Words, and Sounds: From Which Format Are We Best Able to Reason? , 2000, The Journal of general psychology.

[7]  R. Mayer,et al.  Maximizing Constructivist Learning From Multimedia Communications by Minimizing Cognitive Load , 1999 .

[8]  Bruce Balentine Re-Engineering the Speech Menu , 1999 .

[9]  P. Brazdil,et al.  Analysis of results , 1995 .

[10]  B. Shneiderman Designing the User Interface (3rd Ed.) , 1998 .

[11]  R. Mayer,et al.  A Split-Attention Effect in Multimedia Learning: Evidence for Dual Processing Systems in Working Memory , 1998 .

[12]  Laurent Karsenty,et al.  Shifting the Design Philosophy of Spoken Natural Language Dialogue: From Invisible to Transparent Systems , 2002, Int. J. Speech Technol..

[13]  Daryle Gardner-Bonneau,et al.  Human Factors and Voice Interactive Systems , 1999 .

[14]  A. Yeung Cognitive Load and Learner Expertise: Split-Attention and Redundancy Effects in Reading Comprehension Tasks With Vocabulary Definitions , 1999 .

[15]  Arthur I. Karshmer,et al.  Proceedings of the third international ACM conference on Assistive technologies , 1998 .

[16]  Susan J. Boyce,et al.  Natural spoken dialogue systems for telephony applications , 2000, CACM.

[17]  Chris Baber,et al.  Factors affecting users' choice of words in speech-based interaction with public technology , 1997, Int. J. Speech Technol..

[18]  Timothy W. Finin,et al.  Task integration in multimodal speech recognition environments , 1997, CROS.

[19]  Daryle Gardner-Bonneau,et al.  Guidelines for Speech-Enabled IVR Application Design , 1999 .

[20]  Susan J. Boyce,et al.  Spoken Natural Language Dialogue Systems: User Interface Issues for the Future , 1999 .

[21]  Richard S. Velayo,et al.  How Do Presentation Modality and Strategy Use Influence Memory for Paired Concepts , 2000 .

[22]  J. Sweller,et al.  Reducing cognitive load by mixing auditory and visual presentation modes , 1995 .

[23]  Julie Baca,et al.  Comparing effects of navigational interface modalities on speaker prosodics , 1998, Assets '98.

[24]  Johanna D. Moore,et al.  Proceedings of the Conference on Human Factors in Computing Systems , 1989 .

[25]  S. Schiffer,et al.  ANALYSIS OF THE RESULTS , 1971 .

[26]  E. Hirshman,et al.  Dual-mode presentation and its effect on implicit and explicit memory. , 1998, The American journal of psychology.