Electrophysiology of auditory-visual speech integration

Twenty-six native English Speakers identified auditory (A), visual (V), and congruent and incongruent auditory-visual (AV) syllables while undergoing electroencephalography (EEG) in three experiments. In Experiment 1, unimodal (A, V) and bimodal (AV) stimuli were presented in separate blocks. In Experiment 2, the same stimuli were pseudo-randomized in the same blocks, providing a replication of Experiment 1 while testing the effect of participants’ expectancy on the AV condition. In Experiment 3, McGurk fusion (audio /pa/ dubbed onto visual /ka/, eliciting the percept /ta/) and combination (audio /ka/ dubbed onto visual /pa/) stimuli were tested under visual attention [1]. EEG recordings show early effects of visual influence on auditory evoked-related potentials (P1/N1/P2 complex). Specifically, a robust amplitude reduction of the N1/P2 complex was observed (Experiments 1 and 2) that could not be solely accounted for by attentional effects (Experiment 3). The N1/P2 reduction was accompanied by a temporal facilitation (approximating ~20ms) of the P1/N1and N1/P2 transitions in AV conditions. Additionally, incongruent syllables showed a different profile from congruent AV /ta/ over a large latency range (~50 to 350ms post-auditory onset), which was influenced by the accuracy of identification of the visual stimuli presented unimodally. Our results suggest that (i) auditory processing is modulated early on by visual speech inputs, in agreement with an early locus of AV speech interaction, (ii) natural precedence of visual kinematics facilitates auditory speech processing in the time domain, and (iii) the degree of temporal gain is a function of the saliency of visual speech inputs.

[1]  Steven Greenberg,et al.  Speech intelligibility derived from asynchronous processing of auditory-visual information , 2001, AVSP.

[2]  John J. Foxe,et al.  Multisensory auditory-visual interactions during early sensory processing in humans: a high-density electrical mapping study. , 2002, Brain research. Cognitive brain research.

[3]  K S Helfer,et al.  Auditory and auditory-visual perception of clear and conversational speech. , 1997, Journal of speech, language, and hearing research : JSLHR.

[4]  C. N. Guy,et al.  The parallel visual motion inputs into areas V1 and V5 of human cerebral cortex. , 1995, Brain : a journal of neurology.

[5]  E. Bullmore,et al.  Activation of auditory cortex during silent lipreading. , 1997, Science.

[6]  R. Campbell,et al.  Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex , 2000, Current Biology.

[7]  R. Hari,et al.  Seeing speech: visual information from lip movements modifies activity in the human auditory cortex , 1991, Neuroscience Letters.

[8]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[9]  J. Pernier,et al.  Dynamics of cortico-subcortical cross-modal operations involved in audio-visual object detection in humans. , 2002, Cerebral cortex.

[10]  K. Grant,et al.  Auditory-visual speech recognition by hearing-impaired subjects: consonant recognition, sentence recognition, and auditory-visual integration. , 1998, The Journal of the Acoustical Society of America.

[11]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[12]  Eiichi Iwai,et al.  Neuronal activity in visual, auditory and polysensory areas in the monkey temporal cortex during visual fixation task , 1991, Brain Research Bulletin.

[13]  Q. Summerfield,et al.  Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[14]  H. Kennedy,et al.  Anatomical Evidence of Multimodal Integration in Primate Striate Cortex , 2002, The Journal of Neuroscience.

[15]  P. Gribble,et al.  Temporal constraints on the McGurk effect , 1996, Perception & psychophysics.

[16]  P. Baudonniere,et al.  Evidence of a visual-to-auditory cross-modal sensory gating phenomenon as reflected by the human P50 event-related brain potential modulation , 2003, Neuroscience Letters.

[17]  I. Winkler,et al.  Organizing sound sequences in the human brain: the interplay of auditory streaming and temporal integration 1 1 Published on the World Wide Web on 27 February 2001. , 2001, Brain Research.

[18]  R. Näätänen Attention and brain function , 1992 .

[19]  G. Celesia Organization of auditory cortical areas in man. , 1976, Brain : a journal of neurology.

[20]  G. Plant Perceiving Talking Faces: From Speech Perception to a Behavioral Principle , 1999 .

[21]  M. Giard,et al.  Auditory-Visual Integration during Multimodal Object Recognition in Humans: A Behavioral and Electrophysiological Study , 1999, Journal of Cognitive Neuroscience.

[22]  B. Stein,et al.  The Merging of the Senses , 1993 .

[23]  D W Massaro,et al.  Perception of asynchronous and conflicting visual and auditory speech. , 1996, The Journal of the Acoustical Society of America.

[24]  P. Deltenre,et al.  Mismatch negativity evoked by the McGurk–MacDonald effect: a phonetic representation within short-term memory , 2002, Clinical Neurophysiology.

[25]  Blaise Yvert,et al.  Simultaneous intracerebral EEG recordings of early auditory thalamic and cortical activity in human , 2002, The European journal of neuroscience.

[26]  Steven Greenberg,et al.  The temporal properties of spoken Japanese are similar to those of English , 1997, EUROSPEECH.

[27]  David Poeppel,et al.  The analysis of speech in different temporal integration windows: cerebral lateralization as 'asymmetric sampling in time' , 2003, Speech Commun..

[28]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[29]  A. Macleod,et al.  A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: rationale, evaluation, and recommendations for use. , 1990, British journal of audiology.