Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings

Recent advances in Automatic Speech Recognition (ASR) have made this technology a potential solution for transcribing audio input in real-time for people who are Deaf or Hard of Hearing (DHH). However, ASR is imperfect; users must cope with errors in the output. While some prior research has studied ASR-generated transcriptions to provide captions for DHH people, there has not been a systematic study of how to best present captions that may include errors from ASR software nor how to make use of the ASR system's word-level confidence. We conducted two studies, with 21 and 107 DHH participants, to compare various methods of visually presenting the ASR output with certainty values. Participants answered subjective preference questions and provided feedback on how ASR captioning could be used with confidence display markup. Users preferred captioning styles with which they were already most familiar (that did not display confidence information), and they were concerned about the accuracy of ASR systems. While they expressed interest in systems that display word confidence during captions, they were concerned that text appearance changes may be distracting. The findings of this study should be useful for researchers and companies developing automated captioning systems for DHH users.

[1]  Geoffrey Zweig,et al.  The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Stephen R. Gulliver,et al.  How level and type of deafness affect user perception of multimedia video clips , 2003, Universal Access in the Information Society.

[3]  Vicki L. Hanson,et al.  SlidePacer: A Presentation Delivery Tool for Instructors of Deaf and Hard of Hearing Students , 2016, ASSETS.

[4]  D. Jackson,et al.  Prior knowledge and reading comprehension ability of deaf adolescents. , 1997, Journal of deaf studies and deaf education.

[5]  Geoffrey Zweig,et al.  Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[6]  Frank G Bowe Deaf and Hard of Hearing Americans' Instant Messaging and E-Mail Use: A National Survey , 2002, American annals of the deaf.

[7]  Susan J. Parault,et al.  Reading motivation, reading amount, and text comprehension in deaf and hearing adults. , 2010, Journal of deaf studies and deaf education.

[8]  Camilla Warnicke Video Relay Service , 2017 .

[9]  Manfred K. Warmuth,et al.  THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .

[10]  Denis Jouvet,et al.  Qualitative investigation of the display of speech recognition results for communication with deaf people , 2015, SLPAT@Interspeech.

[11]  Mohammad Bagher Menhaj,et al.  A Fuzzy Logic-Based Video Subtitle and Caption Coloring System , 2012, Adv. Fuzzy Syst..

[12]  Kunio Kashino,et al.  Visualizing video sounds with sound word animation , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[13]  Seiichi Nakagawa,et al.  Effect of Captioning Lecture Videos For Learning in Foreign Language , 2013 .

[14]  Connie Mayer,et al.  Using miscue analysis to assess comprehension in deaf college readers. , 2011, Journal of deaf studies and deaf education.

[15]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[16]  Gregg C. Vanderheiden,et al.  Crowd caption correction (CCC) , 2013, ASSETS.

[17]  Georgios N. Yannakakis,et al.  Ranking vs. Preference: A Comparative Study of Self-reporting , 2011, ACII.

[18]  M. Wald Using Automatic Speech Recognition to Enhance Education for All Students: Turning a Vision into Reality , 2005, Proceedings Frontiers in Education 35th Annual Conference.

[19]  Rhianne Jones,et al.  Online News Videos: The UX of Subtitle Position , 2015, ASSETS.

[20]  Zuzanna Klyszejko,et al.  Verbatim, Standard, or Edited?: Reading Patterns of Different Captioning Styles Among Deaf, Hard of Hearing, and Hearing Viewers , 2011, American annals of the deaf.

[21]  Quoc V. Vy ENHANCED CAPTIONING: SPEAKER IDENTIFICATION USING GRAPHICAL AND TEXT-BASED IDENTIFIERS , 2013 .

[22]  M. Marschark,et al.  Benefits of sign language interpreting and text alternatives for deaf students' classroom learning. , 2006, Journal of deaf studies and deaf education.

[23]  Thomas Way,et al.  Inclusion of deaf students in computer science classes using real-time speech transcription , 2007, ITiCSE '07.

[24]  Seyed Ghorshi,et al.  Using augmented reality and automatic speech recognition techniques to help deaf and hard of hearing people , 2012, VRIC '12.

[25]  M. Stinson,et al.  College Students' Perceptions of the C-Print Speech-to-Text Transcription System. , 2001, Journal of deaf studies and deaf education.

[26]  Meng Wang,et al.  Dynamic captioning: video accessibility enhancement for hearing impairment , 2010, ACM Multimedia.

[27]  Yashesh Gaur,et al.  The effects of automatic speech recognition quality on human transcription latency , 2016, W4A.

[28]  James Baker,et al.  A historical perspective of speech recognition , 2014, CACM.

[29]  Deborah I. Fels,et al.  Dancing with words , 2007, C&C '07.

[30]  Walter S. Lasecki,et al.  Accessibility Evaluation of Classroom Captions , 2014, ACM Trans. Access. Comput..

[31]  P. Lawson,et al.  Federal Communications Commission , 2004, Bell Labs Technical Journal.

[32]  P. David Stotts,et al.  Semi-transparent video interfaces to assist deaf persons in meetings , 2007, ACM-SE 45.

[33]  Alina A. S. Secară,et al.  R U ready 4 new subtitles? Investigating the potential of social translation practices and creative spellings , 2021, Linguistica Antverpiensia, New Series – Themes in Translation Studies.

[34]  Hiroshi Arai,et al.  Visualization of Non-verbal Expressions in Voice for Hearing Impaired - Ambient Font and Onomatopoeic Subsystem , 2012, ICCHP.

[35]  Richard Harvey,et al.  Accommodating color blind computer users , 2006, Assets '06.

[36]  Brent N. Shiver,et al.  Evaluating Alternatives for Better Deaf Accessibility to Selected Web-Based Multimedia , 2015, ASSETS.

[37]  Richard E. Ladner,et al.  Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students , 2016, ASSETS.

[38]  Anselm L. Strauss,et al.  Basics of qualitative research : techniques and procedures for developing grounded theory , 1998 .

[39]  Stefan Decker,et al.  Integrating Text with Video and 3D Graphics: The Effects of Text Drawing Styles on Text Readability , 2010, CHI.

[40]  Soraia Silva Prietch,et al.  A Speech-To-Text System's Acceptance Evaluation: Would Deaf Individuals Adopt This Technology in Their Lives? , 2014, HCI.

[41]  Per Ola Kristensson,et al.  On the benefits of confidence visualization in speech recognition , 2008, CHI.

[42]  David R. Flatla,et al.  Situation-Specific Models of Color Differentiation , 2012, TACC.

[43]  Deborah I. Fels,et al.  Emotive captioning , 2007, CIE.

[44]  Raja S. Kushalnagar,et al.  Tracked Speech-To-Text Display: Enhancing Accessibility and Readability of Real-Time Speech-To-Text , 2015, ASSETS.

[45]  Matt Huenerfauth,et al.  Deaf and Hard of Hearing Individuals' Perceptions of Communication with Hearing Colleagues in Small Groups , 2016, ASSETS.