Correcting automatic speech recognition captioning errors in real time

Lectures can be digitally recorded and replayed to provide multimedia revision material for students who attended the class and a substitute learning experience for students unable to attend. Deaf and hard of hearing people can find it difficult to follow speech through hearing alone or to take notes while they are lip-reading or watching a sign-language interpreter. Synchronising the speech with text captions can ensure deaf students are not disadvantaged and assist all learners to search for relevant specific parts of the multimedia recording by means of the synchronised text. Automatic speech recognition has been used to provide real-time captioning directly from lecturers’ speech in classrooms but it has proved difficult to obtain accuracy comparable to stenography. This paper describes the development, testing and evaluation of a system that enables editors to correct errors in the captions as they are created by automatic speech recognition and makes suggestions for future possible improvements.

[1]  M. Draper,et al.  Keynote paper , 1992, Other Conferences.

[2]  Caroline Lyon,et al.  Speech-Based Real-Time Subtitling Services , 2004, Int. J. Speech Technol..

[3]  Clare-Marie Karat,et al.  Overcoming unusability: developing efficient strategies in speech recognition systems , 2000, CHI Extended Abstracts.

[4]  F. Coffield Learning styles and pedagogy in post-16 learning: a systematic and critical review , 2004 .

[5]  Kazuo Onoe,et al.  Speech recognition with a re-speak method for subtitling live broadcasts , 2002, INTERSPEECH.

[6]  Sara H. Basson,et al.  Accessibility, transcription, and access everywhere , 2005, IBM Syst. J..

[7]  Peter Wolf,et al.  The ePresence Interactive Webcasting and Archiving System: Technology Overview and Current Research Issues , 2004 .

[8]  Alexander H. Waibel,et al.  Model-based and empirical evaluation of multimodal interactive error correction , 1999, CHI '99.

[9]  K. Ecclestone,et al.  Learning styles and pedagogy in post-16 learning , 2004 .

[10]  Michael Stinson Perceptions of Hearing-Impaired College Students toward Real-Time Speech to Print: RTGD and Other Educational Support Services , 1988 .

[11]  Mark A. Clements,et al.  Phonetic searching applied to on-line distance learning modules , 2002, Proceedings of 2002 IEEE 10th Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop..

[12]  Gregory D. Abowd,et al.  Lessons learned from eClass: Assessing automated capture and access in the classroom , 2004, TCHI.

[13]  Joseph Ferenbok,et al.  Exploring User Interaction with Digital Videos , 2004 .

[14]  James R. Lewis,et al.  Effect of Error Correction Strategy on Speech Dictation Throughput , 1999 .

[15]  Mike Wald Hearing disability and technology , 2002 .

[16]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[17]  Daniel B. Horn,et al.  Patterns of entry and correction in large vocabulary continuous speech recognition systems , 1999, CHI '99.

[18]  David E. Kieras Using the Keystroke-Level Model to Estimate Execution Times , 2003 .

[19]  Sara H. Basson,et al.  Speech recognition in university classrooms: liberated learning project , 2002, Assets '02.

[20]  Ben Shneiderman,et al.  The limits of speech recognition , 2000, CACM.

[21]  D AbowdGregory,et al.  Lessons learned from eClass , 2004 .