Hearing through lip-reading: the brain synthesizes features of absent speech

Lip-reading is crucial to understand speech in challenging conditions. Neuroimaging investigations have revealed that lip-reading activates auditory cortices in individuals covertly repeating absent—but known—speech. However, in real-life, one usually has no detailed information about the content of upcoming speech. Here we show that during silent lip-reading of unknown speech, activity in auditory cortices entrains more to absent speech than to seen lip movements at frequencies below 1 Hz. This entrainment to absent speech was characterized by a speech-to-brain delay of 50–100 ms as when actually listening to speech. We also observed entrainment to lip movements at the same low frequency in the right angular gyrus, an area involved in processing biological motion. These findings demonstrate that the brain can synthesize high-level features of absent unknown speech sounds from lip-reading that can facilitate the processing of the auditory input. Such a synthesis process may help explain well-documented bottom-up perceptual effects.

[1]  Lars Meyer,et al.  Synchronization of Electrophysiological Responses with Speech Benefits Syntactic Information Processing , 2018, Journal of Cognitive Neuroscience.

[2]  Bob McMurray,et al.  Immediate lexical integration of novel word forms , 2015, Cognition.

[3]  A. Giraud,et al.  Faster phonological processing and right occipito-temporal coupling in deaf adults signal poor cochlear implant outcome , 2017, Nature Communications.

[4]  D. Knill,et al.  The Bayesian brain: the role of uncertainty in neural coding and computation , 2004, Trends in Neurosciences.

[5]  Asif A. Ghazanfar,et al.  The Natural Statistics of Audiovisual Speech , 2009, PLoS Comput. Biol..

[6]  Joachim Gross,et al.  Perceptually relevant speech tracking in auditory and motor cortex reflects distinct linguistic features , 2018, PLoS biology.

[7]  Arjan Hillebrand,et al.  Beamformer analysis of MEG data. , 2005, International review of neurobiology.

[8]  J. Vroomen,et al.  Electrophysiological evidence for speech-specific audiovisual integration , 2014, Neuropsychologia.

[9]  Martin Luessi,et al.  MNE software for processing MEG and EEG data , 2014, NeuroImage.

[10]  T. Allison,et al.  Social perception from visual cues: role of the STS region , 2000, Trends in Cognitive Sciences.

[11]  Edmund C. Lalor,et al.  Visual Cortical Entrainment to Motion and Categorical Speech Features during Silent Lipreading , 2017, Front. Hum. Neurosci..

[12]  Joon Son Chung,et al.  Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  John J. Foxe,et al.  Investigating the temporal dynamics of auditory cortical activation to silent lipreading , 2015, 2015 7th International IEEE/EMBS Conference on Neural Engineering (NER).

[14]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[15]  Jean-Luc Schwartz,et al.  No, There Is No 150 ms Lead of Visual Speech on Auditory Speech, but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag , 2014, PLoS Comput. Biol..

[16]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[17]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Pavel Sovka,et al.  Approximation of the null distribution of the multiple coherence estimated with segment overlapping , 2014, Signal Process..

[19]  E Ahissar,et al.  Speech comprehension is correlated with temporal response patterns recorded from auditory cortex , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[20]  W. Drongelen,et al.  Localization of brain electrical activity via linearly constrained minimum variance spatial filtering , 1997, IEEE Transactions on Biomedical Engineering.

[21]  Kevin A. Pelphrey,et al.  Grasping the Intentions of Others: The Perceived Intentionality of an Action Influences Activity in the Superior Temporal Sulcus during Social Perception , 2004, Journal of Cognitive Neuroscience.

[22]  D. Poeppel,et al.  Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party” , 2013, Neuron.

[23]  Lars Meyer,et al.  Linguistic Bias Modulates Interpretation of Speech via Neural Delta-Band Oscillations , 2016, Cerebral cortex.

[24]  E. Oja,et al.  Independent Component Analysis , 2013 .

[25]  D. Poeppel,et al.  Cortical Tracking of Hierarchical Linguistic Structures in Connected Speech , 2015, Nature Neuroscience.

[26]  P. Schyns,et al.  Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain , 2013, PLoS biology.

[27]  E. Bullmore,et al.  Activation of auditory cortex during silent lipreading. , 1997, Science.

[28]  Edmund C. Lalor,et al.  The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli , 2016, Front. Hum. Neurosci..

[29]  Bruce Fischl,et al.  Within-subject template estimation for unbiased longitudinal image analysis , 2012, NeuroImage.

[30]  Manuel Carreiras,et al.  Out‐of‐synchrony speech entrainment in developmental dyslexia , 2016, Human brain mapping.

[31]  Stefano Panzeri,et al.  Contributions of local speech encoding and functional connectivity to audio-visual speech perception , 2017, eLife.

[32]  Andrea E Martin,et al.  A mechanism for the cortical computation of hierarchical linguistic structure , 2017, PLoS biology.

[33]  S. Taulu,et al.  Spatiotemporal signal space separation method for rejecting nearby interference in MEG measurements , 2006, Physics in medicine and biology.

[34]  J. J. Collins,et al.  A random number generator based on the logit transform of the logistic variable , 1992 .

[35]  Gregor Thut,et al.  Lip movements entrain the observers’ low-frequency brain oscillations to facilitate speech intelligibility , 2016, eLife.

[36]  Richard S. J. Frackowiak,et al.  The anatomy of phonological and semantic processing in normal subjects. , 1992, Brain : a journal of neurology.

[37]  Riitta Hari,et al.  Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene , 2016, The Journal of Neuroscience.

[38]  J. Ashburner,et al.  Nonlinear spatial normalization using basis functions , 1999, Human brain mapping.

[39]  D I Perrett,et al.  Frameworks of analysis for the neural representation of animate objects and actions. , 1989, The Journal of experimental biology.

[40]  Nicola Molinaro,et al.  Contrasting functional imaging parametric maps: The mislocation problem and alternative solutions , 2018, NeuroImage.

[41]  V. Jousmäki,et al.  The pace of prosodic phrasing couples the listener's cortex to the reader's voice , 2013, Human brain mapping.

[42]  M. Sams,et al.  Primary auditory cortex activation by visual speech: an fMRI study at 3 T , 2005, Neuroreport.

[43]  S. Taulu,et al.  Applications of the signal space separation method , 2005, IEEE Transactions on Signal Processing.

[44]  Mikko Sams,et al.  Perception of matching and conflicting audiovisual speech in dyslexic and fluent readers: An fMRI study at 3 T , 2006, NeuroImage.

[45]  C. Spence,et al.  The Handbook of Multisensory Processing , 2004 .

[46]  Riitta Hari,et al.  Cortical kinematic processing of executed and observed goal-directed hand actions , 2015, NeuroImage.

[47]  V. Jousmäki,et al.  MEG Insight into the Spectral Dynamics Underlying Steady Isometric Muscle Contraction , 2017, The Journal of Neuroscience.

[48]  Joachim Gross,et al.  Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension , 2012, Cerebral cortex.

[49]  M. Tanenhaus,et al.  Gradient effects of within-category phonetic variation on lexical access , 2002, Cognition.

[50]  Erkki Oja,et al.  Independent component approach to the analysis of EEG and MEG recordings , 2000, IEEE Transactions on Biomedical Engineering.

[51]  Riitta Hari,et al.  Corticokinematic coherence mainly reflects movement-induced proprioceptive feedback , 2015, NeuroImage.

[52]  John J. Foxe,et al.  Neural responses to uninterrupted natural speech can be extracted with precise temporal resolution , 2010, The European journal of neuroscience.

[53]  Elisa Leonardelli,et al.  A Visual Cortical Network for Deriving Phonological Information from Intelligible Lip Movements , 2018, Current Biology.

[54]  Jean Vroomen,et al.  Do you see what you are hearing? Cross-modal effects of speech sounds on lipreading , 2010, Neuroscience Letters.

[55]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[56]  Karl J. Friston,et al.  Incorporating Prior Knowledge into Image Registration , 1997, NeuroImage.

[57]  P. McGuire,et al.  Cortical substrates for the perception of face actions: an fMRI study of the specificity of activation for seen speech and for meaningless lower-face acts (gurning). , 2001, Brain research. Cognitive brain research.

[58]  P. Bertelson,et al.  Visual Recalibration of Auditory Speech Identification , 2003, Psychological science.

[59]  Aina Puce,et al.  Electrophysiology and brain imaging of biological motion. , 2003, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[60]  Wens Vincent,et al.  Cortical kinematic processing of executed and observed goal-directed hand actions , 2014 .

[61]  A M Amjad,et al.  A framework for the analysis of mixed time series/point process data--theory and application to the study of physiological tremor, single motor unit discharges and electromyograms. , 1995, Progress in biophysics and molecular biology.

[62]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[63]  D. Poeppel,et al.  Phase Patterns of Neuronal Responses Reliably Discriminate Speech in Human Auditory Cortex , 2007, Neuron.