Understanding users' perception of speech recognition errors in mobile communication

Speech recognition errors remain a problem in the design of voice-based user interfaces. Owing to the limited system resources and constrained input methods, error correction is particularly difficult on mobile devices. This research investigated the users' perception of a proposed multimodal interface design that allows a user to send and receive voice-dictated text messages on cell phones. Task-based interviews were performed to examine the participants' understanding, acceptance and overall satisfaction. Findings indicate that an audio readout significantly improves the users' understanding of the misrecognised messages. An in-depth investigation reveals how users' perception is impacted by speech recognition errors in mobile communication.

[1]  D. Swinney Lexical access during sentence comprehension: (Re)consideration of context effects , 1979 .

[2]  G. V. van Orden A ROWS is a ROSE: spelling, sound, and reading. , 1987, Memory & cognition.

[3]  G. C. Orden A ROWS is a ROSE: Spelling, sound, and reading , 1987 .

[4]  B. Brinton,et al.  Responses to requests for clarification by linguistically normal and language-impaired children in conversation. , 1988, The Journal of speech and hearing disorders.

[5]  N. Bell,et al.  Gestalt imagery: A critical factor in language comprehension , 1991, Annals of dyslexia.

[6]  Robert E. Kraut,et al.  Expressive richness: a comparison of speech and text as media for revision , 1991, CHI.

[7]  William A. Ainsworth,et al.  Feedback Strategies for Error Correction in Speech Recognition Systems , 1992, Int. J. Man Mach. Stud..

[8]  Dylan M. Jones,et al.  Data-entry by voice: facilitating correction of misrecognitions , 1993 .

[9]  Chris Baber,et al.  Interactive speech technology: human factors issues in the application of speech input/output to computers , 1993 .

[10]  R Frost,et al.  Phonetic recoding of phonologically ambiguous printed words. , 1993, Journal of experimental psychology. Learning, memory, and cognition.

[11]  A. Pollatsek,et al.  Automatic access of semantic information by phonological codes in visual word recognition. , 1993, Journal of experimental psychology. Learning, memory, and cognition.

[12]  M. Turvey,et al.  Visual lexical access is initially phonological: 1. Evidence from associative priming by words, homophones, and pseudohomophones. , 1994, Journal of experimental psychology. General.

[13]  G. C. Orden,et al.  Interdependence of form and function in cognitive systems explains perception of printed words. , 1994, Journal of experimental psychology. Human perception and performance.

[14]  James H. Bradford,et al.  The human factors of speech-based interfaces: a research agenda , 1995, SGCH.

[15]  R. Frost,et al.  Phonological computation and missing vowels: mapping lexical involvement in reading. , 1995, Journal of experimental psychology. Learning, memory, and cognition.

[16]  J. Ziegler,et al.  Phonological Information Provides Early Sources of Constraint in the Processing of Letter Strings , 1995 .

[17]  Sharon L. Oviatt,et al.  Error resolution during multimodal human-computer interaction , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Sandrine Robbe,et al.  Towards usable multimodal command languages: definition and ergonomic assessment of constraints on users' spontaneous speech and gestures , 1997, EUROSPEECH.

[19]  C. Luo,et al.  Automatic activation of phonological information in reading: Evidence from the semantic relatedness decision task , 1998, Memory & cognition.

[20]  A. Pollatsek,et al.  Evidence for the use of assembled phonology in accessing the meaning of printed words. , 1998, Journal of experimental psychology. Learning, memory, and cognition.

[21]  Alexander H. Waibel,et al.  Model-based and empirical evaluation of multimodal interactive error correction , 1999, CHI '99.

[22]  L. Tan,et al.  Phonological Activation in Visual Identification of Chinese Two-Character Words , 1999 .

[23]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[24]  L. Tan,et al.  Phonological Activation in Visual Identification of Chinese Two-Character Words , 1999 .

[25]  Gregory D. Abowd,et al.  Error Correction Techniques for Handwriting, Speech, and Other Ambiguous or Error Prone Systems , 1999 .

[26]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[27]  Teddy Mantoro,et al.  Location History in a Low-cost Context Awareness Environment , 2003, ACSW.

[28]  Kevin Larson,et al.  Speech Error Correction: The Story of the Alternates List , 2003, Int. J. Speech Technol..

[29]  Emmanuel Munguia Tapia,et al.  Acquiring in situ training data for context-aware ubiquitous computing applications , 2004, CHI.

[30]  Sabine Deligne,et al.  Pervasive Speech Recognition , 2004, IEEE Pervasive Comput..

[31]  Michael F. McTear,et al.  Handling errors and determining confirmation strategies - An object-based approach , 2003, Speech Commun..

[32]  Alexander H. Waibel,et al.  The connector: facilitating context-aware communication , 2005, ICMI '05.

[33]  Encarna Segarra,et al.  Error handling in a stochastic dialog system through confidence measures , 2005, Speech Commun..

[34]  Henry Lieberman,et al.  How to wreck a nice beach you sing calm incense , 2005, IUI.

[35]  Plamen J. Prodanov,et al.  Bayesian networks based multi-modality fusion for error handling in human-robot dialogues under noisy conditions , 2005, Speech Commun..

[36]  Lou Boves,et al.  Effective error recovery strategies for multimodal form-filling applications , 2005, Speech Commun..

[37]  Xiaoyu Chen,et al.  Patterns of Multimodal Input Usage in Non-Visual Information Navigation , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[38]  A. Ant Ozok,et al.  Short Messaging Service use among college students in USA and its potential as an educational tool: an exploratory study , 2007, Int. J. Mob. Learn. Organisation.

[39]  Reggie Davidrajuh,et al.  Array-based logic for realising inference engine in mobile applications , 2007, Int. J. Mob. Learn. Organisation.

[40]  Christopher J. Brown,et al.  Communities of practice in innovation management: sensemaking challenges to mobile organisations , 2007, Int. J. Mob. Learn. Organisation.

[41]  Lorna Uden,et al.  Activity theory for designing mobile learning , 2007, Int. J. Mob. Learn. Organisation.

[42]  Yanjie Song,et al.  SMS enhanced vocabulary learning for mobile audiences , 2008, Int. J. Mob. Learn. Organisation.