Accessibility Evaluation of Classroom Captions

Real-time captioning enables deaf and hard of hearing (DHH) people to follow classroom lectures and other aural speech by converting it into visual text with less than a five second delay. Keeping the delay short allows end-users to follow and participate in conversations. This article focuses on the fundamental problem that makes real-time captioning difficult: sequential keyboard typing is much slower than speaking. We first surveyed the audio characteristics of 240 one-hour-long captioned lectures on YouTube, such as speed and duration of speaking bursts. We then analyzed how these characteristics impact caption generation and readability, considering specifically our human-powered collaborative captioning approach. We note that most of these characteristics are also present in more general domains. For our caption comparison evaluation, we transcribed a classroom lecture in real-time using all three captioning approaches. We recruited 48 participants (24 DHH) to watch these classroom transcripts in an eye-tracking laboratory. We presented these captions in a randomized, balanced order. We show that both hearing and DHH participants preferred and followed collaborative captions better than those generated by automatic speech recognition (ASR) or professionals due to the more consistent flow of the resulting captions. These results show the potential to reliably capture speech even during sudden bursts of speed, as well as for generating “enhanced” captions, unlike other human-powered captioning approaches.

[1]  Brad A. Myers,et al.  Analyzing the input stream for character- level errors in unconstrained text entry evaluations , 2006, TCHI.

[2]  Thomas Way,et al.  Inclusion of deaf students in computer science classes using real-time speech transcription , 2007, ITiCSE '07.

[3]  William J. Hussar,et al.  The Condition of Education , 2010 .

[4]  Walter S. Lasecki,et al.  Real-time captioning by groups of non-experts , 2012, UIST.

[5]  Walter S. Lasecki,et al.  Warping time for more effective real-time crowdsourcing , 2013, CHI.

[6]  G. Tindal,et al.  Oral Reading Fluency Norms: A Valuable Assessment Tool for Reading Teachers , 2006 .

[7]  Wei Zhang,et al.  Developing high performance asr in the IBM multilingual speech-to-speech translation system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Keisuke Nakano,et al.  Computing the Cost of Typechecking of Composition of Macro Tree Transducers , 2009 .

[9]  WayThomas,et al.  Inclusion of deaf students in computer science classes using real-time speech transcription , 2007 .

[10]  Frank Thorn,et al.  Television Captions for Hearing-Impaired People: A Study of Key Factors that Affect Reading Performance , 1996, Hum. Factors.

[11]  H. G. Lang Higher education for deaf students: research priorities in the new millennium. , 2002, Journal of deaf studies and deaf education.

[12]  Michael S. Bernstein,et al.  Crowds in two seconds: enabling realtime crowd-powered interfaces , 2011, UIST.

[13]  R. Mayer,et al.  Cognitive constraints on multimedia learning: When presenting more material results in less understanding. , 2001 .

[14]  Walter S. Lasecki,et al.  Online quality control for real-time crowd captioning , 2012, ASSETS '12.

[15]  Carl Jensema Viewer Reaction to Different Television Captioning Speeds , 1998, American annals of the deaf.

[16]  J. Pelz,et al.  Learning via direct and mediated instruction by deaf students. , 2008, Journal of deaf studies and deaf education.

[17]  M. Wald Using Automatic Speech Recognition to Enhance Education for All Students: Turning a Vision into Reality , 2005, Proceedings Frontiers in Education 35th Annual Conference.

[18]  Walter S. Lasecki,et al.  Captions versus transcripts for online video content , 2013, W4A.

[19]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[20]  M. Stinson,et al.  College Students' Perceptions of the C-Print Speech-to-Text Transcription System. , 2001, Journal of deaf studies and deaf education.

[21]  Deborah I. Fels,et al.  Emotive captioning , 2007, CIE.

[22]  G. Downey Constructing "Computer-Compatible" Stenographers: The Transition to Real-time Transcription in Courtroom Reporting , 2006 .

[23]  Ahmed Sabbir Arif,et al.  Analysis of text entry performance metrics , 2009, 2009 IEEE Toronto International Conference Science and Technology for Humanity (TIC-STH).

[24]  D. Klatt Linguistic uses of segmental duration in English: acoustic and perceptual evidence. , 1976, The Journal of the Acoustical Society of America.

[25]  Walter S. Lasecki,et al.  A readability evaluation of real-time crowd captions in the classroom , 2012, ASSETS '12.

[26]  Rob Miller,et al.  Real-time crowd control of existing interfaces , 2011, UIST.

[27]  Walter S. Lasecki,et al.  Adaptive time windows for real-time crowd captioning , 2013, CHI Extended Abstracts.

[28]  Sara H. Basson,et al.  Speech recognition in university classrooms: liberated learning project , 2002, Assets '02.

[29]  Carl Ralph Scott Jensema,et al.  Closed-Captioned Television Presentation Speed and Vocabulary , 1996, American annals of the deaf.

[30]  Jeffrey P. Bigham,et al.  VizWiz: nearly real-time answers to visual questions , 2010, W4A.

[31]  Michael Picheny,et al.  Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication , 2010, CHI.

[32]  Robert Burch,et al.  Time Spent Viewing Captions on Television Programs , 2000, American annals of the deaf.

[33]  J Boyd,et al.  Captioned television for the deaf. , 1972, American annals of the deaf.