Third-party error detection support mechanisms for dictation speech recognition

Although speech recognition has improved significantly in recent years, its adoption continues to be limited, in part, by the effort and frustration associated with correcting speech recognition errors. Error detection is a particularly challenging issue in third-party error correction where different individuals are responsible for the original dictation and correcting the resulting text. This research aims to address the difficulty experienced in third-party error detection by developing and evaluating a variety of support mechanisms. Drawing on a growing body of literature on human computer interaction and speech recognition, four support mechanisms were designed and evaluated, namely indexed audio, speech summarization, error prediction, and the presentation of alternative hypotheses. A user study assessed the impact of these support mechanisms on both performance and perceptions during error detection tasks. Performance measures included effectiveness and efficiency, and perception measures included confidence, perceived usefulness, and cognitive workload. The results provide strong support for the use of indexed audio in the context of third-party error detection. The results also confirm that consecutive error rate, or the percentage of recognition errors immediately adjacent to another error, has a negative impact on the effectiveness of third-party error detection. Other support mechanisms failed to improve either effectiveness or perceptions, but they did negate the negative impact as consecutive error rate increased. These findings have significant implications for speech recognition error detection research and the design of error detection support solutions.

[1]  Jens Edlund,et al.  Early error detection on word level , 2004 .

[2]  Lina Zhou,et al.  Examining knowledge sources for human error correction , 2006, INTERSPEECH.

[3]  M. Stella Atkins,et al.  Improving the Utility of Speech Recognition Through Error Detection , 2008, Journal of Digital Imaging.

[4]  Brian Amento,et al.  Error correction of voicemail transcripts in SCANMail , 2006, CHI.

[5]  Andrew Sears,et al.  Using confidence scores to improve hands-free speech based navigation in continuous dictation systems , 2004, TCHI.

[6]  Lina Zhou,et al.  Are Some Speech Recognition Errors Easier to Detect than Others? , 2007, HLT-NAACL.

[7]  J. Hair Multivariate data analysis , 1972 .

[8]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[9]  Morten Hertzum,et al.  Acceptance of speech recognition by physicians: A survey of expectations, experiences, and social influence , 2009, Int. J. Hum. Comput. Stud..

[10]  A. Bandura Human agency in social cognitive theory. , 1989, The American psychologist.

[11]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[12]  W. Tryon Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: an integrated alternative method of conducting null hypothesis statistical tests. , 2001, Psychological methods.

[13]  Fred Paas,et al.  Uncovering cognitive processes: Different techniques that can contribute to cognitive load research and instruction , 2009, Comput. Hum. Behav..

[14]  Andrew Sears,et al.  Data mining for detecting errors in dictation speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[15]  Charles Chowa,et al.  Information System Success: Individual and Organizational Determinants , 2006, Manag. Sci..

[16]  Timothy J. Hazen Automatic alignment and error correction of human generated transcripts for long speech recordings , 2006, INTERSPEECH.

[17]  David D. Palmer,et al.  Context-based Speech Recognition Error Detection and Correction , 2004, NAACL.

[18]  Andrew Sears,et al.  Discovering Cues to Error Detection in Speech Recognition Output: A User-Centered Approach , 2006, J. Manag. Inf. Syst..

[19]  Eric Brill,et al.  Beyond N-Grams: Can Linguistic Sophistication Improve Language Modeling? , 1998, COLING-ACL.

[20]  Geoff Cumming,et al.  Evaluating speech-based composition methods: Planning, dictation, and the listening word processor. , 1996 .

[21]  Julia Hirschberg,et al.  From text to speech summarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[22]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[23]  Li Deng,et al.  Challenges in adopting speech recognition , 2004, CACM.

[24]  Masataka Goto,et al.  Speech repair: quick error correction just by using selection operation for speech input interfaces , 2005, INTERSPEECH.

[25]  Rafid A. Sukkar,et al.  Correcting recognition errors via discriminative utterance verification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[26]  James R. Glass,et al.  Confidence scoring for speech understanding systems , 1998, ICSLP.

[27]  Reza Barkhi The Effects of Decision Guidance and Problem Modeling on Group Decision-Making , 2002, J. Manag. Inf. Syst..

[28]  Douglas J. Nelson,et al.  Separation of non-spontaneous and spontaneous speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[29]  Mariëlle Leijten,et al.  Writing with speech recognition: The adaptation process of professional writers with and without dictating experience , 2005, Interact. Comput..

[30]  C. Michael Levy,et al.  The Science of Writing : Theories, Methods, Individual Differences and Applications , 1996 .

[31]  John Vergo,et al.  MedSpeak: report creation with continuous speech recognition , 1997, CHI.

[32]  Silvia Heinz,et al.  Cognitive Load in eCommerce Applications - Measurement and Effects on User Satisfaction , 2009, Adv. Hum. Comput. Interact..

[33]  Andrew Sears,et al.  Speech-Based Text Entry for Mobile Handheld Devices: An Analysis of Efficacy and Error Correction Techniques for Server-Based Solutions , 2005, Int. J. Hum. Comput. Interact..

[34]  Paula L. Rechner,et al.  Experiential Effects of Dialectical Inquiry, Devil's Advocacy and Consensus Approaches to Strategic Decision Making , 1989 .

[35]  Tomi Kauppinen,et al.  Improvement of Report Workflow and Productivity Using Speech Recognition—A Follow-up Study , 2008, Journal of Digital Imaging.

[36]  Kasper Hornbæk,et al.  Current practice in measuring usability: Challenges to usability studies and research , 2006, Int. J. Hum. Comput. Stud..

[37]  Hitoshi Iida,et al.  A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurence , 1998, COLING-ACL.

[38]  Daniel B. Horn,et al.  Patterns of entry and correction in large vocabulary continuous speech recognition systems , 1999, CHI '99.

[39]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[40]  Sharon L. Oviatt,et al.  Error resolution during multimodal human-computer interaction , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[41]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[42]  William A. Ainsworth,et al.  Feedback Strategies for Error Correction in Speech Recognition Systems , 1992, Int. J. Man Mach. Stud..

[43]  Srinivasan Raghunathan,et al.  Impact of information quality and decision-maker quality on decision quality: a theoretical model and simulation analysis , 1999, Decis. Support Syst..

[44]  Tamara van Gog,et al.  State of the art research into Cognitive Load Theory , 2009, Comput. Hum. Behav..

[45]  Rong Zhang,et al.  Word level confidence annotation using combinations of features , 2001, INTERSPEECH.

[46]  Susana Rubio,et al.  Evaluation of Subjective Mental Workload: A Comparison of SWAT, NASA‐TLX, and Workload Profile Methods , 2004 .

[47]  W C Blackwelder,et al.  "Proving the null hypothesis" in clinical trials. , 1981, Controlled clinical trials.

[48]  Clare-Marie Karat,et al.  Hands-Free, Speech-Based Navigation During Dictation: Difficulties, Consequences, and Solutions , 2003, Hum. Comput. Interact..

[49]  Lin Lawrence Chase,et al.  Word and acoustic confidence annotation for large vocabulary speech recognition , 1997, EUROSPEECH.

[50]  Sara H. Basson,et al.  Accessibility, transcription, and access everywhere , 2005, IBM Syst. J..

[51]  Deborah Compeau,et al.  Social Cognitive Theory and Individual Reactions to Computing Technology: A Longitudinal Study , 1999, MIS Q..

[52]  Fred D. Davis Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology , 1989, MIS Q..

[53]  Lina Zhou,et al.  An investigation of linguistic information for speech recognition error detection , 2008 .

[54]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[55]  Larry Gillick,et al.  A probabilistic approach to confidence estimation and evaluation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Ephraim R. McLean,et al.  Information Systems Success: The Quest for the Dependent Variable , 1992, Inf. Syst. Res..

[57]  Mike Wald,et al.  Correcting automatic speech recognition captioning errors in real time , 2007, Int. J. Speech Technol..

[58]  Vassilios Digalakis,et al.  Combining Knowledge Sources to Reorder N-Best Speech Hypothesis Lists , 1994, HLT.

[59]  Clare-Marie Karat,et al.  Productivity, satisfaction, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software , 2001, Universal Access in the Information Society.

[60]  Mukund Padmanabhan,et al.  Error corrective mechanisms for speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[61]  Henry Lieberman,et al.  How to wreck a nice beach you sing calm incense , 2005, IUI.

[62]  Michael Picheny,et al.  Word level confidence measurement using semantic features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[63]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[64]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[65]  Wai Kit Lo,et al.  A multi-pass error detection and correction framework for Mandarin LVCSR , 2006, INTERSPEECH.