The Influence of Age, Hearing, and Working Memory on the Speech Comprehension Benefit Derived from an Automatic Speech Recognition System

Objective: The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. Design: In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. Results: The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained from the subtitles. Speech comprehension improved even for relatively low ASR accuracy levels; for example, participants obtained about 2 dB SNR audiovisual benefit for ASR accuracies around 74%. Delaying the presentation of the text reduced the benefit and increased the listening effort. Participants with relatively low unimodal speech comprehension obtained greater benefit from the subtitles than participants with better unimodal speech comprehension. We observed an age-related decline in the working-memory capacity of the listeners with normal hearing. A higher age and a lower working memory capacity were associated with increased effort required to use the subtitles to improve speech comprehension. Conclusions: Participants were able to use partly incorrect and delayed subtitles to increase their comprehension of speech in noise, regardless of age and hearing loss. This supports the further development and evaluation of an assistive listening system that displays automatically recognized speech to aid speech comprehension by listeners with hearing impairment.

[1]  Tammo Houtgast,et al.  Auditory and nonauditory factors affecting speech reception in noise by older listeners. , 2007, The Journal of the Acoustical Society of America.

[2]  C S Watson,et al.  Individual differences in the processing of speech and nonspeech sounds by normal-hearing listeners. , 2001, The Journal of the Acoustical Society of America.

[3]  P. Carpenter,et al.  Lexical retrieval and error recovery in reading: A model based on eye fixations , 1981 .

[4]  R. Sekuler,et al.  Contrast sensitivity throughout adulthood , 1982, Vision Research.

[5]  Tammo Houtgast,et al.  The Benefit Obtained from Visually Displayed Text from an Automatic Speech Recognizer During Listening to Speech Presented in Noise , 2008, Ear and hearing.

[6]  K Spens,et al.  Vibrotactile speech tracking support: cognitive prerequisites. , 1998, Journal of deaf studies and deaf education.

[7]  A Wingfield,et al.  Age differences in processing information from television news: the effects of bisensory augmentation. , 1990, Journal of gerontology.

[8]  P. Rabbitt,et al.  A study of performance on tests from the CANTAB battery sensitive to frontal lobe dysfunction in a large sample of normal volunteers: Implications for theories of executive functioning and cognitive aging , 1998, Journal of the International Neuropsychological Society.

[9]  A. Macleod,et al.  Quantifying the contribution of vision to speech perception in noise. , 1987, British journal of audiology.

[10]  A. M. Mimpen,et al.  Improving the reliability of testing the speech reception threshold for sentences. , 1979, Audiology : official organ of the International Society of Audiology.

[11]  Edgar Erdfelder,et al.  G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences , 2007, Behavior research methods.

[12]  A Baddeley,et al.  The fractionation of working memory. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Sara H. Basson,et al.  Speech recognition in university classrooms: liberated learning project , 2002, Assets '02.

[14]  S. Arlinger,et al.  Speech understanding in quiet and noise, with and without hearing aids , 2005, International journal of audiology.

[15]  T Houtgast,et al.  Method for the selection of sentence materials for efficient measurement of the speech reception threshold. , 1999, The Journal of the Acoustical Society of America.

[16]  Thomas Lunner,et al.  Recognition of speech in noise with new hearing instrument compression release settings requires explicit cognitive storage and processing capacity. , 2007, Journal of the American Academy of Audiology.

[17]  P F Seitz,et al.  Assessing the Cognitive Demands of Speech Listening for People with Hearing Losses , 1996, Ear and hearing.

[18]  M. Daneman,et al.  Working memory and language comprehension: A meta-analysis , 1996, Psychonomic bulletin & review.

[19]  M. K. Pickora-Fuller Processing speed and timing in aging adults: psychoacoustics, speech perception, and comprehension. , 2003, International journal of audiology.

[20]  T. Lunner,et al.  Cognition counts: A working memory system for ease of language understanding (ELU) , 2008, International journal of audiology.

[21]  Thomas Lunner,et al.  Cognitive function in relation to hearing aid use , 2003, International journal of audiology.

[22]  Tammo Houtgast,et al.  The development of the text reception threshold test: a visual analogue of the speech reception threshold test. , 2007, Journal of speech, language, and hearing research : JSLHR.

[23]  P. Carpenter,et al.  Individual differences in working memory and reading , 1980 .

[24]  Stig Arlinger,et al.  Negative consequences of uncorrected hearing loss—a review , 2003, International journal of audiology.

[25]  M. C. Fastame,et al.  Working memory components of the Corsi blocks task. , 2004, British journal of psychology.

[26]  A Wingfield,et al.  Cognitive factors in auditory performance: context, speed of processing, and constraints of memory. , 1996, Journal of the American Academy of Audiology.

[27]  J. Rönnberg Cognition in the hearing impaired and deaf as a bridge between signal and dialogue: a framework and a model , 2003, International journal of audiology.

[28]  M. Daneman,et al.  How young and old adults listen to and remember speech in noise. , 1995, The Journal of the Acoustical Society of America.

[29]  L Hickson,et al.  Candidature for and delivery of audiological services: special needs of older people , 2003, International journal of audiology.

[30]  P. Kricos,et al.  Audiologic Management of Older Adults With Hearing Loss and Compromised Cognitive/Psychoacoustic Auditory Processing Capabilities , 2006, Trends in amplification.

[31]  Ayanna K. Thomas,et al.  Theoretical Perspectives on Cognitive Aging , 2019, Handbook of Medical Neuropsychology.

[32]  R. Plomp A signal-to-noise ratio model for the speech-reception threshold of the hearing impaired. , 1986, Journal of speech and hearing research.

[33]  M. Boxtel,et al.  Mild Hearing Impairment Can Reduce Verbal Memory Performance in a Healthy Adult Population , 2000, Journal of clinical and experimental neuropsychology.

[34]  Graham Naylor,et al.  Linear and nonlinear hearing aid fittings – 2. Patterns of candidature , 2006, International journal of audiology.

[35]  Birgitta Larsby,et al.  Cognitive performance and perceived effort in speech processing tasks: effects of different noise backgrounds in normal-hearing and hearing-impaired subjects. , 2005, International journal of audiology.

[36]  Graham Naylor,et al.  Benefits from hearing aids in relation to the interaction between the user and the environment , 2003, International journal of audiology.

[37]  T. Robbins,et al.  Planning and spatial working memory following frontal lobe lesions in man , 1990, Neuropsychologia.