Interface design strategies for computer-assisted speech transcription

A set of user interface design techniques for computer-assisted speech transcription are presented and evaluated with respect to task performance and usability. These techniques include error-correction mechanisms which originated in dictation systems and audio editors as well as new techniques developed by us which exploit specific characteristics of existing speech recognition technologies in order to facilitate transcription in settings that typically yield considerable recognition inaccuracy, such as when the speech to be transcribed was produced by different speakers. In particular, we describe a mechanism for dynamic propagation of user feedback which progressively adapts the system to different speakers and lexical contexts. Results of usability and performance evaluation trials indicate that feedback propagation, menu-based correction coupled with keyboard interaction and text-driven audio playback are positively perceived by users and result in improved transcript accuracy.

[1]  Frank K. Soong,et al.  Word graph based speech rcognition error correction by handwriting input , 2006, ICMI '06.

[2]  Masood Masoodian,et al.  Interactive visualisation techniques for dynamic speech transcription, correction and training , 2008, CHINZ.

[3]  Gerald Penn,et al.  Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts , 2008, CHI.

[4]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[5]  Timothy J. Hazen Automatic alignment and error correction of human generated transcripts for long speech recordings , 2006, INTERSPEECH.

[6]  Chris Baber,et al.  Modelling the effects of constraint upon speech-based human-computer interaction , 1999, Int. J. Hum. Comput. Stud..

[7]  William A. Ainsworth,et al.  Feedback Strategies for Error Correction in Speech Recognition Systems , 1992, Int. J. Man Mach. Stud..

[8]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[9]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[10]  Stephen M. Borowitz Computer-Based Speech Recognition as a Replacement for Medical Transcription , 1999 .

[11]  J. Marc Overhage,et al.  A simple error classification system for understanding sources of error in automatic speech recognition and human transcription , 2004, Int. J. Medical Informatics.

[12]  Tatsuya Kawahara,et al.  Towards an efficient archive of spontaneous speech: Design of computer‐assisted speech transcription system , 2006 .

[13]  Mari Ostendorf,et al.  Robust information extraction from automatically generated speech transcriptions , 2000, Speech Commun..

[14]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[15]  Stephen M. Borowitz,et al.  Case Report: Computer-based Speech Recognition as an Alternative to Medical Transcription , 2001, J. Am. Medical Informatics Assoc..

[16]  Saturnino Luz,et al.  An analytical evaluation of search by content and interaction patterns on multimodal meeting records , 2007, Multimedia Systems.

[17]  Clare-Marie Karat,et al.  The Beauty of Errors: Patterns of Error Correction in Desktop Speech Systems , 1999, INTERACT.

[18]  David N. Mohr,et al.  Research Paper: Speech Recognition as a Transcription Aid: A Randomized Comparison With Standard Transcription , 2003, J. Am. Medical Informatics Assoc..

[19]  Hervé Bourlard,et al.  On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .