Auditory speech detection in noise enhanced by lipreading

Abstract Audiovisual speech stimuli have been shown to produce a variety of perceptual phenomena. Enhanced detectability of acoustic speech in noise, when the talker can also be seen, is one of those phenomena. This study investigated whether this enhancement effect is specific to visual speech stimuli or can rely on more generic non-speech visual stimulus properties. Speech detection thresholds for an auditory /ba/ stimulus were obtained in a white noise masker. The auditory /ba/ was presented adaptively to obtain its 79.4% detection threshold under five conditions. In Experiment 1, the syllable was presented (1) auditory-only (AO) and (2) as audiovisual speech (AVS), using the original video recording. Three types of synthetic visual stimuli were also paired synchronously with the audio token: (3) A dynamic Lissajous (AVL) figure whose vertical extent was correlated with the acoustic speech envelope; (4) a dynamic rectangle (AVR) whose horizontal extent was correlated with the speech envelope; and (5) a static rectangle (AVSR) whose onset and offset were synchronous with the acoustic speech onset and offset. Ten adults with normal hearing and vision participated. The results, in terms of dB signal-to-noise ratio (SNR), were AVS

[1]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[2]  S. Zeki,et al.  ■ REVIEW : Parallel Processing, Asynchronous Perception, and a Distributed System of Consciousness in Vision , 1998 .

[3]  K. Grant,et al.  The effect of speechreading on masked detection thresholds for filtered speech. , 2001, The Journal of the Acoustical Society of America.

[4]  Frédéric Berthommier,et al.  Audio-visual scene analysis: evidence for a "very-early" integration process in audio-visual speech perception , 2002, INTERSPEECH.

[5]  P. Deltenre,et al.  Electrophysiology of spatial scene analysis: the mismatch negativity (MMN) is sensitive to the ventriloquism illusion , 2002, Clinical Neurophysiology.

[6]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[7]  M Steinschneider,et al.  Temporal encoding of the voice onset time phonetic parameter by field potentials recorded directly from human auditory cortex. , 1999, Journal of neurophysiology.

[8]  P. Bertelson,et al.  Multisensory integration, perception and ecological validity , 2003, Trends in Cognitive Sciences.

[9]  B. Stein,et al.  The Merging of the Senses , 1993 .

[10]  B. Stein,et al.  Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors , 1987, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[11]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[12]  G. V. Simpson,et al.  Flow of activation from V1 to frontal cortex in humans , 2001, Experimental Brain Research.

[13]  Roger Ratcliff,et al.  Methods for Dealing With Reaction Time Outliers , 1992 .

[14]  P F Seitz,et al.  The use of visible speech cues for improving auditory detection of spoken sentences. , 2000, The Journal of the Acoustical Society of America.

[15]  L. Bernstein,et al.  Audiovisual Speech Binding: Convergence or Association? , 2004 .

[16]  M. Meredith,et al.  On the neuronal basis for multisensory convergence: a brief overview. , 2002, Brain research. Cognitive brain research.

[17]  P. Arnold,et al.  Bisensory augmentation: a speechreading advantage when speech is clearly audible and intact. , 2001, British journal of psychology.

[18]  O Bertrand,et al.  Multiple supratemporal sources of magnetic and electric auditory evoked middle latency components in humans. , 2001, Cerebral cortex.

[19]  R. Campbell,et al.  Hearing by eye : the psychology of lip-reading , 1988 .

[20]  Aina Puce,et al.  Electrophysiology and brain imaging of biological motion. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[21]  S. Soli,et al.  Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[22]  Frédéric Berthommier,et al.  Auditory syllabic identification enhanced by non-informative visible speech , 2003, AVSP.

[23]  C. Malsburg Binding in models of perception and brain function , 1995, Current Opinion in Neurobiology.

[24]  H. Levitt Transformed up-down methods in psychoacoustics. , 1971, The Journal of the Acoustical Society of America.

[25]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[26]  M. Mesulam,et al.  From sensation to cognition. , 1998, Brain : a journal of neurology.

[27]  L. Bernstein,et al.  Speech perception without hearing , 2000, Perception & psychophysics.

[28]  A. Treisman The binding problem , 1996, Current Opinion in Neurobiology.

[29]  John J. Foxe,et al.  The timing and laminar profile of converging inputs to multisensory areas of the macaque neocortex. , 2002, Brain research. Cognitive brain research.

[30]  S. Zeki,et al.  A direct demonstration of perceptual asynchrony in vision , 1997, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[31]  R. Näätänen The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm). , 2001, Psychophysiology.