Visual Input Enhances Selective Speech Envelope Tracking in Auditory Cortex at a “Cocktail Party”

Our ability to selectively attend to one auditory signal amid competing input streams, epitomized by the “Cocktail Party” problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared with responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker's face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a Cocktail Party setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive.

[1]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[2]  J. O'neill Contributions of the visual components of oral symbols to speech comprehension. , 1954, The Journal of speech and hearing disorders.

[3]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[4]  N. Moray Attention in Dichotic Listening: Affective Cues and the Influence of Instructions , 1959 .

[5]  D H HUBEL,et al.  "Attention" Units in the Auditory Cortex , 1959, Science.

[6]  R. Birdwhistell Kinesics and Context: Essays on Body Motion Communication , 1971 .

[7]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[8]  S. Hillyard,et al.  Electrical Signs of Selective Attention in the Human Brain , 1973, Science.

[9]  U. Hadar,et al.  Head Movement Correlates of Juncture and Stress at Sentence Level , 1983, Language and speech.

[10]  G. Studebaker A "rationalized" arcsine transform. , 1985, Journal of speech and hearing research.

[11]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[12]  K. Reinikainen,et al.  Selective attention enhances the auditory 40-Hz transient response in humans , 1993, Nature.

[13]  F. Bloom,et al.  Modulation of early sensory processing in human auditory cortex during auditory selective attention. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[14]  N. Cowan,et al.  The cocktail party phenomenon revisited: how frequent are attention shifts to one's name in an irrelevant auditory channel? , 1995, Journal of experimental psychology. Learning, memory, and cognition.

[15]  G. Buzsáki,et al.  Temporal structure in spatially organized neuronal ensembles: a role for interneuronal networks , 1995, Current Opinion in Neurobiology.

[16]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[17]  E. Large,et al.  The dynamics of attending: How people track time-varying events. , 1999 .

[18]  T. Sejnowski,et al.  Removal of eye activity artifacts from visual event-related potentials in normal and clinical subjects , 2000, Clinical Neurophysiology.

[19]  P F Seitz,et al.  The use of visible speech cues for improving auditory detection of spoken sentences. , 2000, The Journal of the Acoustical Society of America.

[20]  K. Grant,et al.  The effect of speechreading on masked detection thresholds for filtered speech. , 2001, The Journal of the Acoustical Society of America.

[21]  J. Gallant,et al.  Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. , 2001, Network.

[22]  Y. Sugita,et al.  Auditory-visual speech perception examined by fMRI and PET , 2003, Neuroscience Research.

[23]  Jeesun Kim,et al.  Hearing Foreign Voices: Does Knowing What is Said Affect Visual-Masked-Speech Detection? , 2003, Perception.

[24]  Jeffery A. Jones,et al.  Neural processes underlying perceptual enhancement by visual speech gestures , 2003, Neuroreport.

[25]  J. Schwartz,et al.  Seeing to hear better: evidence for early audio-visual interactions in speech identification , 2004, Cognition.

[26]  Jeffery A. Jones,et al.  Visual Prosody and Speech Intelligibility , 2004, Psychological science.

[27]  A. Fort,et al.  Bimodal speech: early suppressive visual effects in human auditory cortex , 2004, The European journal of neuroscience.

[28]  B. Argall,et al.  Integration of Auditory and Visual Information about Objects in Superior Temporal Sulcus , 2004, Neuron.

[29]  David Poeppel,et al.  Visual speech speeds up the neural processing of auditory speech. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  R. Freyman,et al.  The role of visual speech cues in reducing energetic and informational masking. , 2005, The Journal of the Acoustical Society of America.

[31]  R. Drullman The significance of temporal modulation frequencies for speech intelligibility. Part I: Perspective , 2005 .

[32]  Ankoor S. Shah,et al.  An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. , 2005, Journal of neurophysiology.

[33]  R. Ilmoniemi,et al.  Interpreting magnetic fields of the brain: minimum norm estimates , 2006, Medical and Biological Engineering and Computing.

[34]  M. Jones,et al.  Effects of auditory pattern structure on anticipatory and reactive attending , 2006, Cognitive Psychology.

[35]  Jonathan Z. Simon,et al.  Abstract Journal of Neuroscience Methods 165 (2007) 297–305 Denoising based on time-shift PCA , 2007 .

[36]  Sophie K. Scott,et al.  From Dichotic Listening to the Irrelevant Sound Effect: a Behavioural and Neuroimaging Analysis of the Processing of Unattended Speech , 2007, Cortex.

[37]  A. Nobre,et al.  The hazards of time , 2007, Current Opinion in Neurobiology.

[38]  S. David,et al.  Estimating sparse spectro-temporal receptive fields with natural stimuli , 2007, Network.

[39]  S. David,et al.  Auditory attention : focusing the searchlight on sound , 2007 .

[40]  John J. Foxe,et al.  Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments. , 2006, Cerebral cortex.

[41]  D. Poeppel,et al.  Phase Patterns of Neuronal Responses Reliably Discriminate Speech in Human Auditory Cortex , 2007, Neuron.

[42]  C. Schroeder,et al.  Neuronal Oscillations and Multisensory Interaction in Primary Auditory Cortex , 2007, Neuron.

[43]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[44]  A. Puce,et al.  Neuronal oscillations and visual amplification of speech , 2008, Trends in Cognitive Sciences.

[45]  N. Logothetis,et al.  Visual modulation of neurons in auditory cortex. , 2008, Cerebral cortex.

[46]  Daniel S. Kislyuk,et al.  The effect of viewing speech on auditory speech processing is different in the left and right hemispheres , 2008, Brain Research.

[47]  T. Picton,et al.  Human Cortical Responses to the Speech Envelope , 2008, Ear and hearing.

[48]  Mounya Elhilali,et al.  A cocktail party with a cortical twist: how cortical mechanisms contribute to sound segregation. , 2008, The Journal of the Acoustical Society of America.

[49]  Luc H. Arnal,et al.  Dual Neural Routing of Visual Facilitation in Speech Processing , 2009, The Journal of Neuroscience.

[50]  G. Buzsáki,et al.  Theta Oscillations Provide Temporal Windows for Local Circuit Computation in the Entorhinal-Hippocampal Loop , 2009, Neuron.

[51]  S. Shamma,et al.  Temporal Coherence in the Perceptual Organization and Cortical Representation of Auditory Scenes , 2009, Neuron.

[52]  P. Keating,et al.  Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English , 2009, Language and speech.

[53]  Josh H. McDermott The cocktail party problem , 2009, Current Biology.

[54]  J. Schoffelen,et al.  Source connectivity analysis with MEG and EEG , 2009, Human brain mapping.

[55]  Asif A. Ghazanfar,et al.  The Natural Statistics of Audiovisual Speech , 2009, PLoS Comput. Biol..

[56]  C. Schroeder,et al.  Neuronal mechanisms, response dynamics and perceptual functions of multisensory interactions in auditory cortex , 2009, Hearing Research.

[57]  C. Schroeder,et al.  The Leading Sense: Supramodal Control of Neurophysiological Context by Attention , 2009, Neuron.

[58]  Lee M. Miller,et al.  A Multisensory Cortical Network for Understanding Speech in Noise , 2009, Journal of Cognitive Neuroscience.

[59]  John J. Foxe,et al.  Neural responses to uninterrupted natural speech can be extracted with precise temporal resolution , 2010, The European journal of neuroscience.

[60]  Antoine J. Shahin,et al.  Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party” , 2010, The Journal of Neuroscience.

[61]  D. Poeppel,et al.  Auditory Cortex Tracks Both Auditory and Visual Stimulus Dynamics Using Low-Frequency Neuronal Phase Modulation , 2010, PLoS biology.

[62]  Jennifer T. Coull,et al.  Attention and Time , 2010 .

[63]  S. Shamma,et al.  Temporal coherence and attention in auditory scene analysis , 2011, Trends in Neurosciences.

[64]  Robert Oostenveld,et al.  FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological Data , 2010, Comput. Intell. Neurosci..

[65]  Oded Ghitza,et al.  Linking Speech Perception and Neurophysiology: Speech Decoding Guided by Cascaded Oscillators Locked to the Input Rhythm , 2011, Front. Psychology.

[66]  Luc H. Arnal,et al.  Transitions in neural oscillations reflect prediction errors generated in audiovisual speech , 2011, Nature Neuroscience.

[67]  J. Simon,et al.  Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. , 2012, Journal of neurophysiology.

[68]  S. Scott,et al.  Speech comprehension aided by multiple modalities: Behavioural and neural interactions , 2012, Neuropsychologia.

[69]  David Poeppel,et al.  Cortical oscillations and speech processing: emerging computational principles and operations , 2012, Nature Neuroscience.

[70]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[71]  D. Poeppel,et al.  Temporal context in speech processing and attentional stream selection: A behavioral and neural perspective , 2012, Brain and Language.

[72]  Hermann Ackermann,et al.  Magnetic brain activity phase-locked to the envelope, the syllable onsets, and the fundamental frequency of a perceived speech signal. , 2012, Psychophysiology.

[73]  Joachim Gross,et al.  Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension , 2012, Cerebral cortex.