论文信息 - Understanding users' perception of speech recognition errors in mobile communication

Understanding users' perception of speech recognition errors in mobile communication

Speech recognition errors remain a problem in the design of voice-based user interfaces. Owing to the limited system resources and constrained input methods, error correction is particularly difficult on mobile devices. This research investigated the users' perception of a proposed multimodal interface design that allows a user to send and receive voice-dictated text messages on cell phones. Task-based interviews were performed to examine the participants' understanding, acceptance and overall satisfaction. Findings indicate that an audio readout significantly improves the users' understanding of the misrecognised messages. An in-depth investigation reveals how users' perception is impacted by speech recognition errors in mobile communication.

Shuang Xu | Shuang Xu

[1] D. Swinney. Lexical access during sentence comprehension: (Re)consideration of context effects , 1979 .

[2] G. V. van Orden. A ROWS is a ROSE: spelling, sound, and reading. , 1987, Memory & cognition.

[3] G. C. Orden. A ROWS is a ROSE: Spelling, sound, and reading , 1987 .

[4] B. Brinton,et al. Responses to requests for clarification by linguistically normal and language-impaired children in conversation. , 1988, The Journal of speech and hearing disorders.

[5] N. Bell,et al. Gestalt imagery: A critical factor in language comprehension , 1991, Annals of dyslexia.

[6] Robert E. Kraut,et al. Expressive richness: a comparison of speech and text as media for revision , 1991, CHI.

[7] William A. Ainsworth,et al. Feedback Strategies for Error Correction in Speech Recognition Systems , 1992, Int. J. Man Mach. Stud..

[8] Dylan M. Jones,et al. Data-entry by voice: facilitating correction of misrecognitions , 1993 .

[9] Chris Baber,et al. Interactive speech technology: human factors issues in the application of speech input/output to computers , 1993 .

[10] R Frost,et al. Phonetic recoding of phonologically ambiguous printed words. , 1993, Journal of experimental psychology. Learning, memory, and cognition.

[11] A. Pollatsek,et al. Automatic access of semantic information by phonological codes in visual word recognition. , 1993, Journal of experimental psychology. Learning, memory, and cognition.

[12] M. Turvey,et al. Visual lexical access is initially phonological: 1. Evidence from associative priming by words, homophones, and pseudohomophones. , 1994, Journal of experimental psychology. General.

[13] G. C. Orden,et al. Interdependence of form and function in cognitive systems explains perception of printed words. , 1994, Journal of experimental psychology. Human perception and performance.

[14] James H. Bradford,et al. The human factors of speech-based interfaces: a research agenda , 1995, SGCH.

[15] R. Frost,et al. Phonological computation and missing vowels: mapping lexical involvement in reading. , 1995, Journal of experimental psychology. Learning, memory, and cognition.

[16] J. Ziegler,et al. Phonological Information Provides Early Sources of Constraint in the Processing of Letter Strings , 1995 .

[17] Sharon L. Oviatt,et al. Error resolution during multimodal human-computer interaction , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18] Sandrine Robbe,et al. Towards usable multimodal command languages: definition and ergonomic assessment of constraints on users' spontaneous speech and gestures , 1997, EUROSPEECH.

[19] C. Luo,et al. Automatic activation of phonological information in reading: Evidence from the semantic relatedness decision task , 1998, Memory & cognition.

[20] A. Pollatsek,et al. Evidence for the use of assembled phonology in accessing the meaning of printed words. , 1998, Journal of experimental psychology. Learning, memory, and cognition.

[21] Alexander H. Waibel,et al. Model-based and empirical evaluation of multimodal interactive error correction , 1999, CHI '99.

[22] L. Tan,et al. Phonological Activation in Visual Identification of Chinese Two-Character Words , 1999 .

[23] Sharon L. Oviatt,et al. Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[24] L. Tan,et al. Phonological Activation in Visual Identification of Chinese Two-Character Words , 1999 .

[25] Gregory D. Abowd,et al. Error Correction Techniques for Handwriting, Speech, and Other Ambiguous or Error Prone Systems , 1999 .

[26] Alexander H. Waibel,et al. Multimodal error correction for speech user interfaces , 2001, TCHI.

[27] Teddy Mantoro,et al. Location History in a Low-cost Context Awareness Environment , 2003, ACSW.

[28] Kevin Larson,et al. Speech Error Correction: The Story of the Alternates List , 2003, Int. J. Speech Technol..

[29] Emmanuel Munguia Tapia,et al. Acquiring in situ training data for context-aware ubiquitous computing applications , 2004, CHI.

[30] Sabine Deligne,et al. Pervasive Speech Recognition , 2004, IEEE Pervasive Comput..

[31] Michael F. McTear,et al. Handling errors and determining confirmation strategies - An object-based approach , 2003, Speech Commun..

[32] Alexander H. Waibel,et al. The connector: facilitating context-aware communication , 2005, ICMI '05.

[33] Encarna Segarra,et al. Error handling in a stochastic dialog system through confidence measures , 2005, Speech Commun..

[34] Henry Lieberman,et al. How to wreck a nice beach you sing calm incense , 2005, IUI.

[35] Plamen J. Prodanov,et al. Bayesian networks based multi-modality fusion for error handling in human-robot dialogues under noisy conditions , 2005, Speech Commun..

[36] Lou Boves,et al. Effective error recovery strategies for multimodal form-filling applications , 2005, Speech Commun..

[37] Xiaoyu Chen,et al. Patterns of Multimodal Input Usage in Non-Visual Information Navigation , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[38] A. Ant Ozok,et al. Short Messaging Service use among college students in USA and its potential as an educational tool: an exploratory study , 2007, Int. J. Mob. Learn. Organisation.

[39] Reggie Davidrajuh,et al. Array-based logic for realising inference engine in mobile applications , 2007, Int. J. Mob. Learn. Organisation.

[40] Christopher J. Brown,et al. Communities of practice in innovation management: sensemaking challenges to mobile organisations , 2007, Int. J. Mob. Learn. Organisation.

[41] Lorna Uden,et al. Activity theory for designing mobile learning , 2007, Int. J. Mob. Learn. Organisation.

[42] Yanjie Song,et al. SMS enhanced vocabulary learning for mobile audiences , 2008, Int. J. Mob. Learn. Organisation.