Multisensory Integration in Speech Processing: Neural Mechanisms of Cross-Modal Aftereffects

Traditionally, perceptual neuroscience has focused on unimodal information processing. This is true also for investigations of speech processing, where the auditory modality was the natural focus of interest. Given the complexity of neuronal processing, this was a logical step, considering that the field was still in its infancy. However, it is clear that this restriction does not do justice to the way we perceive the world around us in everyday interactions. Very rarely is sensory information confined to one modality. Instead, we are constantly confronted with a stream of input to several or all senses and already in infancy, we match facial movements with their corresponding sounds (Campbell et al. 2001; Kuhl and Meltzoff 1982). Moreover, the information that is processed by our individual senses does not stay separated. Rather, the different channels interact and influence each other, affecting perceptual interpretations and constructions (Calvert 2001). Consequently, in the last 15–20 years, the perspective in cognitive science and perceptual neuroscience has shifted to include investigations of such multimodal integrative phenomena. Facilitating cross-modal effects have consistently been demonstrated behaviorally (Shimojo and Shams 2001). When multisensory input is congruent (e.g., semantically and/or temporally) it typically lowers detection thresholds (Frassinetti et al. 2002), shortens reaction times (Forster et al. 2002; Schroger and Widmann 1998), and decreases saccadic eye movement latencies (Hughes et al. 1994) as compared to unimodal exposure. When incongruent input is (artificially) added in a second modality, this usually has opposite consequences (Sekuler et al. 1997).

[1]  J. Gore,et al.  A comparison of bound and unbound audio-visual information processing in the human cerebral cortex. , 2002, Brain research. Cognitive brain research.

[2]  T. Stanford,et al.  Challenges in quantifying multisensory integration: alternative criteria, models, and inverse effectiveness , 2009, Experimental Brain Research.

[3]  Jürgen Kayser,et al.  Reference-independent ERP old/new effects of auditory and visual word recognition memory: Joint extraction of stimulus- and response-locked neuronal generator patterns. , 2007, Psychophysiology.

[4]  Luc H. Arnal,et al.  Transitions in neural oscillations reflect prediction errors generated in audiovisual speech , 2011, Nature Neuroscience.

[5]  Frans A. J. Verstraten,et al.  The motion aftereffect , 1998, Trends in Cognitive Sciences.

[6]  A. Puce,et al.  Neuronal oscillations and visual amplification of speech , 2008, Trends in Cognitive Sciences.

[7]  Rajeev D. S. Raizada,et al.  Selective Amplification of Stimulus Differences during Categorical Processing of Speech , 2007, Neuron.

[8]  Istvan Ulbert,et al.  Projection from visual areas V2 and prostriata to caudal auditory cortex in the monkey. , 2010, Cerebral cortex.

[9]  Kevin G. Munhall,et al.  Something in the way she moves , 2004, Trends in Cognitive Sciences.

[10]  Audrey R. Nath,et al.  fMRI-Guided Transcranial Magnetic Stimulation Reveals That the Superior Temporal Sulcus Is a Cortical Locus of the McGurk Effect , 2010, The Journal of Neuroscience.

[11]  T. Hackett,et al.  Anatomical mechanisms and functional implications of multisensory convergence in early cortical processing. , 2003, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[12]  Christopher R Fetsch,et al.  Neural correlates of reliability-based cue weighting during multisensory integration , 2011, Nature Neuroscience.

[13]  Emily B. Myers,et al.  Inferior Frontal Regions Underlie the Perception of Phonetic Category Invariance , 2009, Psychological science.

[14]  Ryan A. Stevenson,et al.  Superadditive BOLD activation in superior temporal sulcus with threshold non-speech objects , 2007, Experimental Brain Research.

[15]  A. Ghazanfar Unity of the Senses for Primate Vocal Communication , 2012 .

[16]  Jean Vroomen,et al.  Phonetic recalibration in audiovisual speech , 2012 .

[17]  J. Driver,et al.  Multisensory Interplay Reveals Crossmodal Influences on ‘Sensory-Specific’ Brain Regions, Neural Responses, and Judgments , 2008, Neuron.

[18]  J. Rieger,et al.  Audiovisual Temporal Correspondence Modulates Human Multisensory Superior Temporal Sulcus Plus Primary Sensory Cortices , 2007, The Journal of Neuroscience.

[19]  Mikko Sams,et al.  Processing of audiovisual speech in Broca's area , 2005, NeuroImage.

[20]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[21]  S. Shimojo,et al.  Sensory modalities are not separate modalities: plasticity and interactions , 2001, Current Opinion in Neurobiology.

[22]  J. Bizley The Neural Bases of Multisensory Processes , 2011 .

[23]  Cristiana Cavina-Pratesi,et al.  Redundant target effect and intersensory facilitation from visual-tactile interactions in simple reaction time , 2002, Experimental Brain Research.

[24]  John J. Foxe,et al.  Multisensory auditory-visual interactions during early sensory processing in humans: a high-density electrical mapping study. , 2002, Brain research. Cognitive brain research.

[25]  Deborah A. Hall,et al.  Reading Fluent Speech from Talking Faces: Typical Brain Networks and Individual Differences , 2005, Journal of Cognitive Neuroscience.

[26]  R L Diehl,et al.  Contrast effects on stop consonant identification. , 1978, Journal of experimental psychology. Human perception and performance.

[27]  David Raposo,et al.  Dynamic weighting of multisensory stimuli shapes decision-making in rats and humans. , 2013, Journal of vision.

[28]  D E Callan,et al.  Multimodal contribution to speech perception revealed by independent component analysis: a single-sweep EEG case study. , 2001, Brain research. Cognitive brain research.

[29]  Istvan Ulbert,et al.  Multisensory convergence in auditory cortex, II. Thalamocortical connections of the caudal superior temporal plane , 2007, The Journal of comparative neurology.

[30]  N. Logothetis,et al.  Integration of Touch and Sound in Auditory Cortex , 2005, Neuron.

[31]  G. Aschersleben,et al.  Automatic visual bias of perceived auditory location , 1998 .

[32]  Henry Kennedy,et al.  Long-distance feedback projections to area V1: Implications for multisensory integration, spatial awareness, and visual consciousness , 2004, Cognitive, affective & behavioral neuroscience.

[33]  M. Sams,et al.  Primary auditory cortex activation by visual speech: an fMRI study at 3 T , 2005, Neuroreport.

[34]  Wei Ji Ma,et al.  Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature Space , 2009, PloS one.

[35]  R Plomp,et al.  The effect of speechreading on the speech-reception threshold of sentences in noise. , 1987, The Journal of the Acoustical Society of America.

[36]  K Kriegstein von A Multisensory Perspective on Human Auditory Communication , 2012 .

[37]  Gregory McCarthy,et al.  Polysensory interactions along lateral temporal regions evoked by audiovisual speech. , 2003, Cerebral cortex.

[38]  Ryan A. Stevenson,et al.  Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition , 2009, NeuroImage.

[39]  Paul J. Laurienti,et al.  On the use of superadditivity as a metric for characterizing multisensory integration in functional neuroimaging studies , 2005, Experimental Brain Research.

[40]  P. Bertelson,et al.  The After-Effects of Ventriloquism , 1974, The Quarterly journal of experimental psychology.

[41]  Lawrence D. Rosenblum,et al.  Primacy of Multimodal Speech Perception , 2008 .

[42]  Paul Bertelson,et al.  The aftereffects of ventriloquism: Patterns of spatial generalization , 2006, Perception & psychophysics.

[43]  John J. Foxe,et al.  Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. , 2006, Cerebral cortex.

[44]  E. Bullmore,et al.  Activation of auditory cortex during silent lipreading. , 1997, Science.

[45]  R Bowtell,et al.  Lip-Reading Ability and Patterns of Cortical Activation Studied Using fMRI , 2000, British journal of audiology.

[46]  J. Peelle,et al.  Prediction and constraint in audiovisual speech perception , 2015, Cortex.

[47]  M A Meredith,et al.  Multisensory Integration , 1990 .

[48]  T. Paus,et al.  Seeing and hearing speech excites the motor system involved in speech production , 2003, Neuropsychologia.

[49]  E. Formisano,et al.  Auditory Cortex Encodes the Perceptual Interpretation of Ambiguous Sound , 2011, The Journal of Neuroscience.

[50]  Jean Vroomen,et al.  Visual Anticipatory Information Modulates Multisensory Interactions of Artificial Audiovisual Stimuli , 2010, Journal of Cognitive Neuroscience.

[51]  J. Gibson,et al.  Adaptation, after-effect and contrast in the perception of tilted lines. I. Quantitative studies , 1937 .

[52]  T. Florian Jaeger,et al.  A Bayesian Belief Updating Model of Phonetic Recalibration and Selective Adaptation , 2011, CMCL@ACL.

[53]  Caspar M. Schwiedrzik,et al.  Untangling Perceptual Memory: Hysteresis and Adaptation Map into Separate Cortical Networks , 2012, Cerebral cortex.

[54]  M. Ernst,et al.  Humans integrate visual and haptic information in a statistically optimal fashion , 2002, Nature.

[55]  B. Argall,et al.  Integration of Auditory and Visual Information about Objects in Superior Temporal Sulcus , 2004, Neuron.

[56]  A. Macleod,et al.  A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: rationale, evaluation, and recommendations for use. , 1990, British journal of audiology.

[57]  Luc H. Arnal,et al.  Cortical oscillations and sensory predictions , 2012, Trends in Cognitive Sciences.

[58]  P. Bertelson,et al.  Adaptation to auditory-visual discordance and ventriloquism in semirealistic situations , 1977 .

[59]  C. Spence,et al.  The Handbook of Multisensory Processing , 2004 .

[60]  J. Vroomen,et al.  Temporal ventriloquism: sound modulates the flash-lag effect. , 2004, Journal of experimental psychology. Human perception and performance.

[61]  Steven L. Small,et al.  Listening to talking faces: motor cortical activation during speech perception , 2005, NeuroImage.

[62]  A. Meltzoff,et al.  The bimodal perception of speech in infancy. , 1982, Science.

[63]  N. Bolognini,et al.  Enhancement of visual perception by crossmodal visuo-auditory interaction , 2002, Experimental Brain Research.

[64]  A. Ghazanfar,et al.  Is neocortex essentially multisensory? , 2006, Trends in Cognitive Sciences.

[65]  M. Giard,et al.  Auditory-Visual Integration during Multimodal Object Recognition in Humans: A Behavioral and Electrophysiological Study , 1999, Journal of Cognitive Neuroscience.

[66]  P. McGuire,et al.  Neural systems underlying British Sign Language and audio-visual English processing in native users. , 2002, Brain : a journal of neurology.

[67]  M. Sams,et al.  Time course of multisensory interactions during audiovisual speech perception in humans: a magnetoencephalographic study , 2004, Neuroscience Letters.

[68]  Q Summerfield,et al.  Audiovisual presentation demonstrates that selective adaptation in speech perception is purely auditory , 1981, Perception & psychophysics.

[69]  Jeffery A. Jones,et al.  Neural processes underlying perceptual enhancement by visual speech gestures , 2003, Neuroreport.

[70]  S. Scott,et al.  Speech comprehension aided by multiple modalities: Behavioural and neural interactions , 2012, Neuropsychologia.

[71]  Béatrice de Gelder,et al.  Selective adaptation and recalibration of auditory speech by lipread information: Dissipation , 2004, AVSP.

[72]  P. Bertelson,et al.  Visual recalibration and selective adaptation in auditory–visual speech perception: Contrasting build-up courses , 2007, Neuropsychologia.

[73]  Michael S. Beauchamp,et al.  A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography , 2017, Journal of Cognitive Neuroscience.

[74]  Lokalisation von Sinneseindrücken bei disparaten Nebenreizen , 1909 .

[75]  Brigitte Röder,et al.  A new method for detecting interactions between the senses in event-related potentials , 2006, Brain Research.

[76]  M. Alex Meredith,et al.  Neurons and behavior: the same rules of multisensory integration apply , 1988, Brain Research.

[77]  C. Schroeder,et al.  Neuronal Oscillations and Multisensory Interaction in Primary Auditory Cortex , 2007, Neuron.

[78]  Lee M. Miller,et al.  Behavioral/systems/cognitive Perceptual Fusion and Stimulus Coincidence in the Cross- Modal Integration of Speech , 2022 .

[79]  Gregor Thut,et al.  Auditory–Visual Multisensory Interactions in Humans: Timing, Topography, Directionality, and Sources , 2010, The Journal of Neuroscience.

[80]  Joost X. Maier,et al.  Multisensory Integration of Dynamic Faces and Voices in Rhesus Monkey Auditory Cortex , 2005 .

[81]  Lars Muckli,et al.  Cortical Plasticity of Audio–Visual Object Representations , 2008, Cerebral cortex.

[82]  Rainer Goebel,et al.  Analysis of functional image analysis contest (FIAC) data with brainvoyager QX: From single‐subject to cortically aligned group general linear model analysis and self‐organizing group independent component analysis , 2006, Human brain mapping.

[83]  Daphne Bavelier,et al.  The cortical organization of audio-visual sentence comprehension: an fMRI study at 4 Tesla. , 2004, Brain research. Cognitive brain research.

[84]  Takaaki Kuratate,et al.  Linking facial animation, head motion and speech acoustics , 2002, J. Phonetics.

[85]  Karl J. Friston,et al.  The effect of prior visual information on recognition of speech and sounds. , 2008, Cerebral cortex.

[86]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[87]  C. Gilbert,et al.  The Neural Basis of Perceptual Learning , 2001, Neuron.

[88]  Simon B Eickhoff,et al.  Effects of prior information on decoding degraded speech: An fMRI study , 2012, Human brain mapping.

[89]  P. Bertelson,et al.  Cross-modal bias and perceptual fusion with auditory-visual spatial discordance , 1981, Perception & psychophysics.

[90]  S A Hillyard,et al.  An analysis of audio-visual crossmodal integration by means of event-related potential (ERP) recordings. , 2002, Brain research. Cognitive brain research.

[91]  G. Rees,et al.  Predicting the Stream of Consciousness from Activity in Human Visual Cortex , 2005, Current Biology.

[92]  O. Bertrand,et al.  Visual Activation and Audiovisual Interactions in the Auditory Cortex during Speech Perception: Intracranial Recordings in Humans , 2008, The Journal of Neuroscience.

[93]  R. Campbell,et al.  Evidence from functional magnetic resonance imaging of crossmodal binding in the human heteromodal cortex , 2000, Current Biology.

[94]  Michael S. Beauchamp,et al.  Statistical criteria in fMRI studies of multisensory integration , 2005, Neuroinformatics.

[95]  W. Jiang,et al.  Two cortical areas mediate multisensory integration in superior colliculus neurons. , 2001, Journal of neurophysiology.

[96]  G. Rees,et al.  Predicting the orientation of invisible stimuli from activity in human primary visual cortex , 2005, Nature Neuroscience.

[97]  B. Argall,et al.  Unraveling multisensory integration: patchy organization within human STS multisensory cortex , 2004, Nature Neuroscience.

[98]  Jeffery A. Jones,et al.  Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information , 2004, Journal of Cognitive Neuroscience.

[99]  Jean Vroomen,et al.  Phonetic recalibration only occurs in speech mode , 2009, Cognition.

[100]  P. D. Eimas,et al.  Selective adaptation of linguistic feature detectors , 1973 .

[101]  Alfred Anwander,et al.  Direct Structural Connections between Voice- and Face-Recognition Areas , 2011, The Journal of Neuroscience.

[102]  A. Fort,et al.  Bimodal speech: early suppressive visual effects in human auditory cortex , 2004, The European journal of neuroscience.

[103]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[104]  P. Bertelson,et al.  Visual Recalibration of Auditory Speech Identification , 2003, Psychological science.

[105]  G. Calvert Crossmodal processing in the human brain: insights from functional neuroimaging studies. , 2001, Cerebral cortex.

[106]  Noël Staeren,et al.  Sound Categories Are Represented as Distributed Patterns in the Human Auditory Cortex , 2009, Current Biology.

[107]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[108]  R. Sekuler,et al.  Sound alters visual motion perception , 1997, Nature.

[109]  Olivier Bertrand,et al.  Electrophysiological (EEG, sEEG, MEG) evidence for multiple audiovisual interactions in the human auditory cortex , 2009, Hearing Research.

[110]  R. E. Remez Audiovisual Speech Processing: Three puzzles of multimodal speech perception , 2012 .

[111]  P. McGuire,et al.  Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning). , 2001, Brain research. Cognitive brain research.

[112]  E. Bullmore,et al.  Response amplification in sensory-specific cortices during crossmodal binding. , 1999, Neuroreport.

[113]  Michael S Beauchamp,et al.  Neural Correlates of Interindividual Differences in Children's Audiovisual Speech Perception , 2011, The Journal of Neuroscience.

[114]  Jean Vroomen,et al.  Neural Correlates of Multisensory Integration of Ecologically Valid Audiovisual Events , 2007, Journal of Cognitive Neuroscience.

[115]  Asif A Ghazanfar,et al.  Interactions between the Superior Temporal Sulcus and Auditory Cortex Mediate Dynamic Face/Voice Integration in Rhesus Monkeys , 2008, The Journal of Neuroscience.

[116]  R B Welch,et al.  Effect of Degree of Separation of Visual-Auditory Stimulus and Eye Position upon Spatial Interaction of Vision and Audition , 1976, Perceptual and motor skills.

[117]  David Poeppel,et al.  Visual speech speeds up the neural processing of auditory speech. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[118]  G. Stratton Vision without inversion of the retinal image. , 1897 .

[119]  Cheryl M. Capek,et al.  Seeing speech and seeing sign: Insights from a fMRI study , 2008, International journal of audiology.

[120]  Jean Vroomen,et al.  Brain activation during audiovisual exposure anticipates future perception of ambiguous speech , 2011, NeuroImage.

[121]  D. Burr,et al.  The Ventriloquist Effect Results from Near-Optimal Bimodal Integration , 2004, Current Biology.

[122]  E. M. Rouiller,et al.  Multisensory anatomical pathways , 2009, Hearing Research.

[123]  Daniel Senkowski,et al.  Good times for multisensory integration: Effects of the precision of temporal synchrony as revealed by gamma-band oscillations , 2007, Neuropsychologia.

[124]  H. Kennedy,et al.  Anatomical Evidence of Multimodal Integration in Primate Striate Cortex , 2002, The Journal of Neuroscience.

[125]  P. Reuter-Lorenz,et al.  Visual-auditory interactions in sensorimotor processing: saccades versus manual responses. , 1994, Journal of experimental psychology. Human perception and performance.

[126]  Béatrice de Gelder,et al.  Visual recalibration of auditory speech versus selective speech adaptation: different build-up courses , 2004, INTERSPEECH.

[127]  A. Samuel Red herring detectors and speech perception: In defense of selective adaptation , 1986, Cognitive Psychology.

[128]  Lynne E. Bernstein,et al.  Mismatch Negativity with Visual-only and Audiovisual Speech , 2009, Brain Topography.

[129]  Matthew H. Davis,et al.  Predictive Top-Down Integration of Prior Knowledge during Speech Perception , 2012, The Journal of Neuroscience.

[130]  Rainer Goebel,et al.  "Who" Is Saying "What"? Brain-Based Decoding of Human Voice and Speech , 2008, Science.

[131]  Audrey R. Nath,et al.  Dynamic Changes in Superior Temporal Sulcus Connectivity during Perception of Noisy Audiovisual Speech , 2011, The Journal of Neuroscience.

[132]  A. Ishai,et al.  Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex , 2001, Science.

[133]  R L Diehl,et al.  Feature detectors for speech: a critical reappraisal. , 1981, Psychological bulletin.

[134]  E. Schröger,et al.  Speeded responses to audiovisual signal changes result from bimodal integration. , 1998, Psychophysiology.