Mechanisms of enhancing visual–speech recognition by prior auditory information

Speech recognition from visual-only faces is difficult, but can be improved by prior information about what is said. Here, we investigated how the human brain uses prior information from auditory speech to improve visual-speech recognition. In a functional magnetic resonance imaging study, participants performed a visual-speech recognition task, indicating whether the word spoken in visual-only videos matched the preceding auditory-only speech, and a control task (face-identity recognition) containing exactly the same stimuli. We localized a visual-speech processing network by contrasting activity during visual-speech recognition with the control task. Within this network, the left posterior superior temporal sulcus (STS) showed increased activity and interacted with auditory-speech areas if prior information from auditory speech did not match the visual speech. This mismatch-related activity and the functional connectivity to auditory-speech areas were specific for speech, i.e., they were not present in the control task. The mismatch-related activity correlated positively with performance, indicating that posterior STS was behaviorally relevant for visual-speech recognition. In line with predictive coding frameworks, these findings suggest that prediction error signals are produced if visually presented speech does not match the prediction from preceding auditory speech, and that this mechanism plays a role in optimizing visual-speech recognition by prior information.

[1]  Karl J. Friston,et al.  Psychophysiological and Modulatory Interactions in Neuroimaging , 1997, NeuroImage.

[2]  U. Noppeney,et al.  Perceptual Decisions Formed by Accumulation of Audiovisual Evidence in Prefrontal Cortex , 2010, The Journal of Neuroscience.

[3]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[4]  J. Gregory An Investigation of Speechreading With and Without Cued Speech , 1987, American annals of the deaf.

[5]  Karl J. Friston,et al.  Predictive coding under the free-energy principle , 2009, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  J. Haxby,et al.  The distributed human neural system for face perception , 2000, Trends in Cognitive Sciences.

[7]  Karl J. Friston,et al.  A critique of functional localisers , 2006, NeuroImage.

[8]  Stefan J. Kiebel,et al.  Simulation of talking faces in the human brain improves auditory speech recognition , 2008, Proceedings of the National Academy of Sciences.

[9]  Jeremy I. Skipper,et al.  Seeing Voices : How Cortical Areas Supporting Speech Production Mediate Audiovisual Speech Perception , 2007 .

[10]  Karl J. Friston,et al.  Statistical parametric mapping , 2013 .

[11]  E. Ngan,et al.  A larynx area in the human motor cortex. , 2008, Cerebral cortex.

[12]  Audrey R. Nath,et al.  Dynamic Changes in Superior Temporal Sulcus Connectivity during Perception of Noisy Audiovisual Speech , 2011, The Journal of Neuroscience.

[13]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[14]  David B Pisoni,et al.  Some normative data on lip-reading skills (L). , 2011, The Journal of the Acoustical Society of America.

[15]  Robert T. Knight,et al.  Superior Temporal SulcusIt's My Area: Or Is It? , 2008, Journal of Cognitive Neuroscience.

[16]  B. Argall,et al.  Integration of Auditory and Visual Information about Objects in Superior Temporal Sulcus , 2004, Neuron.

[17]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[18]  Timothy E. J. Behrens,et al.  Tools of the trade: psychophysiological interactions and functional connectivity. , 2012, Social cognitive and affective neuroscience.

[19]  Deborah A. Hall,et al.  Reading Fluent Speech from Talking Faces: Typical Brain Networks and Individual Differences , 2005, Journal of Cognitive Neuroscience.

[20]  Karl J. Friston,et al.  Stochastic Designs in Event-Related fMRI , 1999, NeuroImage.

[21]  Luc H. Arnal,et al.  Transitions in neural oscillations reflect prediction errors generated in audiovisual speech , 2011, Nature Neuroscience.

[22]  Joost X. Maier,et al.  Multisensory Integration of Dynamic Faces and Voices in Rhesus Monkey Auditory Cortex , 2005 .

[23]  J. Driver,et al.  Sound-Induced Enhancement of Low-Intensity Vision: Multisensory Influences on Human Sensory-Specific Cortices and Thalamic Bodies Relate to Perceptual Enhancement of Visual Detection Sensitivity , 2010, The Journal of Neuroscience.

[24]  B. Fraysse,et al.  Evidence that cochlear-implanted deaf patients are better multisensory integrators , 2007, Proceedings of the National Academy of Sciences.

[25]  S. Scott,et al.  Functional Integration across Brain Regions Improves Speech Perception under Adverse Listening Conditions , 2007, The Journal of Neuroscience.

[26]  Christoph Kayser,et al.  Spatial Organization of Multisensory Responses in Temporal Association Cortex , 2009, The Journal of Neuroscience.

[27]  M. Iacoboni,et al.  Listening to speech activates motor areas involved in speech production , 2004, Nature Neuroscience.

[28]  Audrey R. Nath,et al.  fMRI-Guided Transcranial Magnetic Stimulation Reveals That the Superior Temporal Sulcus Is a Cortical Locus of the McGurk Effect , 2010, The Journal of Neuroscience.

[29]  R. Campbell,et al.  Reading Speech from Still and Moving Faces: The Neural Substrates of Visible Speech , 2003, Journal of Cognitive Neuroscience.

[30]  Steven L. Small,et al.  Listening to talking faces: motor cortical activation during speech perception , 2005, NeuroImage.

[31]  R. Hari,et al.  Viewing Lip Forms Cortical Dynamics , 2002, Neuron.

[32]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[33]  G. Studebaker A "rationalized" arcsine transform. , 1985, Journal of speech and hearing research.

[34]  Steven L. Small,et al.  Abstract Coding of Audiovisual Speech: Beyond Sensory Representation , 2007, Neuron.

[35]  B. Fraysse,et al.  Improvement in speech-reading ability by auditory training: Evidence from gender differences in normally hearing, deaf and cochlear implanted subjects , 2009, Neuropsychologia.

[36]  A. Amedi,et al.  Functional imaging of human crossmodal identification and object recognition , 2005, Experimental Brain Research.

[37]  Simon B. Eickhoff,et al.  A new SPM toolbox for combining probabilistic cytoarchitectonic maps and functional imaging data , 2005, NeuroImage.

[38]  Kayoko Okada,et al.  Two cortical mechanisms support the integration of visual and auditory speech: A hypothesis and preliminary data , 2009, Neuroscience Letters.

[39]  T. Allison,et al.  Temporal Cortex Activation in Humans Viewing Eye and Mouth Movements , 1998, The Journal of Neuroscience.

[40]  Richard S. J. Frackowiak,et al.  Cross-Modal Plasticity Underpins Language Recovery after Cochlear Implantation , 2001, Neuron.

[41]  J. Rauschecker,et al.  Phoneme and word recognition in the auditory ventral stream , 2012, Proceedings of the National Academy of Sciences.

[42]  Jean Vroomen,et al.  Do you see what you are hearing? Cross-modal effects of speech sounds on lipreading , 2010, Neuroscience Letters.

[43]  Stéphane Lehéricy,et al.  Foot, face and hand representation in the human supplementary motor area , 2004, Neuroreport.

[44]  E. Bullmore,et al.  Activation of auditory cortex during silent lipreading. , 1997, Science.

[45]  Colin Humphries,et al.  Role of left posterior superior temporal gyrus in phonological processing for speech perception and production , 2001, Cogn. Sci..

[46]  G. Studebaker,et al.  Supplementary formulas and tables for calculating and interconverting speech recognition scores in transformed arcsine units , 2004, International journal of audiology.

[47]  Ryan A. Stevenson,et al.  Audiovisual integration in human superior temporal sulcus: Inverse effectiveness and the neural processing of speech and object recognition , 2009, NeuroImage.

[48]  Luc H. Arnal,et al.  Dual Neural Routing of Visual Facilitation in Speech Processing , 2009, The Journal of Neuroscience.

[49]  Gregory Hickok,et al.  Phonological repetition-suppression in bilateral superior temporal sulci , 2010, NeuroImage.

[50]  J. Alegria,et al.  The Role of Lip-reading and Cued Speech in the Processing of Phonological Information in French-educated Deaf Children , 1999 .

[51]  Q. Summerfield,et al.  Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[52]  O. Bertrand,et al.  Visual Activation and Audiovisual Interactions in the Auditory Cortex during Speech Perception: Intracranial Recordings in Humans , 2008, The Journal of Neuroscience.

[53]  B. Argall,et al.  Unraveling multisensory integration: patchy organization within human STS multisensory cortex , 2004, Nature Neuroscience.

[54]  John J. Foxe,et al.  Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. , 2006, Cerebral cortex.

[55]  David Poeppel,et al.  Visual speech speeds up the neural processing of auditory speech. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[56]  P. McGuire,et al.  Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning). , 2001, Brain research. Cognitive brain research.

[57]  J. Driver,et al.  Multisensory Interplay Reveals Crossmodal Influences on ‘Sensory-Specific’ Brain Regions, Neural Responses, and Judgments , 2008, Neuron.

[58]  C. Price,et al.  The role of the posterior superior temporal sulcus in audiovisual processing. , 2008, Cerebral cortex.