The contribution of visual information to the perception of speech in noise with and without informative temporal fine structure

Understanding what is said in demanding listening situations is assisted greatly by looking at the face of a talker. Previous studies have observed that normal-hearing listeners can benefit from this visual information when a talker's voice is presented in background noise. These benefits have also been observed in quiet listening conditions in cochlear-implant users, whose device does not convey the informative temporal fine structure cues in speech, and when normal-hearing individuals listen to speech processed to remove these informative temporal fine structure cues. The current study (1) characterised the benefits of visual information when listening in background noise; and (2) used sine-wave vocoding to compare the size of the visual benefit when speech is presented with or without informative temporal fine structure. The accuracy with which normal-hearing individuals reported words in spoken sentences was assessed across three experiments. The availability of visual information and informative temporal fine structure cues was varied within and across the experiments. The results showed that visual benefit was observed using open- and closed-set tests of speech perception. The size of the benefit increased when informative temporal fine structure cues were removed. This finding suggests that visual information may play an important role in the ability of cochlear-implant users to understand speech in many everyday situations. Models of audio-visual integration were able to account for the additional benefit of visual information when speech was degraded and suggested that auditory and visual information was being integrated in a similar way in all conditions. The modelling results were consistent with the notion that audio-visual benefit is derived from the optimal combination of auditory and visual sensory cues.

[1]  Q J Fu,et al.  Effects of noise and spectral resolution on vowel and consonant recognition: acoustic and electric hearing. , 1998, The Journal of the Acoustical Society of America.

[2]  K. Grant,et al.  Auditory-visual speech recognition by hearing-impaired subjects: consonant recognition, sentence recognition, and auditory-visual integration. , 1998, The Journal of the Acoustical Society of America.

[3]  A Quentin Summerfield,et al.  Benefit of temporal fine structure to speech perception in noise measured with controlled temporal envelopes. , 2011, The Journal of the Acoustical Society of America.

[4]  Hugo Fastl,et al.  Speech Perception With Combined Electric-Acoustic Stimulation: A Simulation and Model Comparison , 2015, Ear and hearing.

[5]  Joshua G. W. Bernstein,et al.  Auditory and auditory-visual intelligibility of speech in fluctuating maskers for normal-hearing and hearing-impaired listeners. , 2009, The Journal of the Acoustical Society of America.

[6]  J K Shallop,et al.  Evaluation of a new spectral peak coding strategy for the Nucleus 22 Channel Cochlear Implant System. , 1994, The American journal of otology.

[7]  P K Kuhl,et al.  The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal-hearing subjects. , 1985, The Journal of the Acoustical Society of America.

[8]  A Kohlrausch,et al.  Intrinsic envelope fluctuations and modulation-detection thresholds for narrow-band noise carriers. , 1999, The Journal of the Acoustical Society of America.

[9]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[10]  A. Boothroyd,et al.  Voice Fundamental Frequency as an Auditory Supplement to the Speechreading of Sentences , 1988, Ear and hearing.

[11]  Antje Ihlefeld,et al.  Simulations of cochlear-implant speech perception in modulated and unmodulated noise. , 2010, The Journal of the Acoustical Society of America.

[12]  Richard L Freyman,et al.  Effect of number of masking talkers and auditory priming on informational masking in speech recognition. , 2004, The Journal of the Acoustical Society of America.

[13]  T. Dau,et al.  Intrinsic envelope fluctuations and modulation-detection thresholds for narrow-band noise carriers. , 1996, The Journal of the Acoustical Society of America.

[14]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[15]  J. Peelle,et al.  Prediction and constraint in audiovisual speech perception , 2015, Cortex.

[16]  Jong Kyoung Kim,et al.  Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[17]  R. Campbell,et al.  Hearing by eye : the psychology of lip-reading , 1988 .

[18]  L D Braida,et al.  Auditory supplements to speechreading: combining amplitude envelope cues from different spectral regions of speech. , 1992, The Journal of the Acoustical Society of America.

[19]  Ray Meddis,et al.  A revised model of the inner-hair cell and auditory-nerve complex. , 2002, The Journal of the Acoustical Society of America.

[20]  Ken W. Grant,et al.  Auditory Supplements to Speechreading , 2003 .

[21]  David B Pisoni,et al.  Some normative data on lip-reading skills (L). , 2011, The Journal of the Acoustical Society of America.

[22]  Bernhard U. Seeber,et al.  A Phenomenological Model of the Electrically Stimulated Auditory Nerve Fiber: Temporal and Biphasic Response Properties , 2016, Front. Comput. Neurosci..

[23]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[24]  B J Gantz,et al.  Performance over time of adult patients using the Ineraid or nucleus cochlear implant. , 1997, The Journal of the Acoustical Society of America.

[25]  S. Kramer,et al.  The self-reported handicapping effect of hearing disabilities. , 1998, Audiology : official organ of the International Society of Audiology.

[26]  B. Moore The Role of Temporal Fine Structure Processing in Pitch Perception, Masking, and Speech Perception for Normal-Hearing and Hearing-Impaired People , 2008, Journal of the Association for Research in Otolaryngology.

[27]  Edgar Erdfelder,et al.  G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences , 2007, Behavior research methods.

[28]  J. Pannekoek,et al.  Bootstrapping Goodness-of-Fit Measures in Categorical Data Analysis , 1996 .

[29]  P J Blamey,et al.  Speech perception using combinations of auditory, visual, and tactile information. , 1989, Journal of rehabilitation research and development.

[30]  Richard L Freyman,et al.  Speech intelligibility in cochlear implant simulations: Effects of carrier type, interfering noise, and subject experience. , 2007, The Journal of the Acoustical Society of America.

[31]  A. Davis,et al.  The prevalence of hearing impairment and reported hearing disability among adults in Great Britain. , 1989, International journal of epidemiology.

[32]  Christophe Micheyl,et al.  Comparing models of the combined-stimulation advantage for speech recognition. , 2012, The Journal of the Acoustical Society of America.

[33]  L. Braida Crossmodal Integration in the Identification of Consonant Segments , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[34]  Jace Wolfe,et al.  Evaluation of speech recognition in noise with cochlear implants and dynamic FM. , 2009, Journal of the American Academy of Audiology.

[35]  Erin C Schafer,et al.  Speech recognition abilities of adults using cochlear implants with FM systems. , 2004, Journal of the American Academy of Audiology.

[36]  Diana Williams,et al.  Listening to Speech , 2018, early Listening Skills.

[37]  Brian C J Moore,et al.  Speech perception problems of the hearing impaired reflect inability to use temporal fine structure , 2006, Proceedings of the National Academy of Sciences.

[38]  K. Grant,et al.  Integration efficiency for speech perception within and across sensory modalities by normal-hearing and hearing-impaired individuals. , 2007, The Journal of the Acoustical Society of America.

[39]  Stuart Rosen,et al.  Listening to speech in a background of other talkers: effects of talker number and noise vocoding. , 2013, The Journal of the Acoustical Society of America.

[40]  Michael K. Qin,et al.  Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers. , 2003, The Journal of the Acoustical Society of America.

[41]  Mitchell Sommers,et al.  Aging, Audiovisual Integration, and the Principle of Inverse Effectiveness , 2010, Ear and hearing.

[42]  M. Dorman,et al.  Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs. , 1997, The Journal of the Acoustical Society of America.

[43]  Eric Truy,et al.  A model-based analysis of the “combined-stimulation advantage” , 2011, Hearing Research.

[44]  N. P. Erber Auditory-visual perception of speech. , 1975, The Journal of speech and hearing disorders.

[45]  David B Pisoni,et al.  Talker and lexical effects on audiovisual word recognition by adults with cochlear implants. , 2003, Journal of speech, language, and hearing research : JSLHR.

[46]  S. Scott,et al.  Speech comprehension aided by multiple modalities: Behavioural and neural interactions , 2012, Neuropsychologia.

[47]  Ying-Yee Kong,et al.  Improved speech recognition in noise in simulated binaurally combined acoustic and electric stimulation. , 2007, The Journal of the Acoustical Society of America.

[48]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[49]  D. Pisoni,et al.  Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[50]  R. Cowan,et al.  Spatial spread of neural excitation in cochlear implant recipients: comparison of improved ECAP method and psychophysical forward masking , 2003, Hearing Research.

[51]  DeLiang Wang,et al.  Multitalker speech perception with ideal time-frequency segregation: effects of voice characteristics and number of talkers. , 2009, The Journal of the Acoustical Society of America.

[52]  Shihab Shamma,et al.  On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system. , 2013, The Journal of the Acoustical Society of America.

[53]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[54]  L D Braida,et al.  Single Band Amplitude Envelope Cues as an Aid to Speechreading , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[55]  Thomas Lunner,et al.  Effect of Speech Material on the Benefit of Temporal Fine Structure Information in Speech for Young Normal-Hearing and Older Hearing-Impaired Participants , 2012, Ear and hearing.

[56]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[57]  B. Fraysse,et al.  Evidence that cochlear-implanted deaf patients are better multisensory integrators , 2007, Proceedings of the National Academy of Sciences.

[58]  S. Desai,et al.  Auditory-visual speech perception in normal-hearing and cochlear-implant listeners. , 2008, The Journal of the Acoustical Society of America.

[59]  C. Schroeder,et al.  Neuronal Oscillations and Multisensory Interaction in Primary Auditory Cortex , 2007, Neuron.

[60]  Michel Treisman,et al.  Combining Information: Probability Summation and Probability Averaging in Detection and Discrimination , 1998 .

[61]  Brian C. J. Moore,et al.  Auditory Processing of Temporal Fine Structure:Effects of Age and Hearing Loss , 2014 .

[62]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[63]  L D Rosenblum,et al.  Effects of talker variability on speechreading , 2000, Perception & psychophysics.