Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene

Using a continuous listening task, we evaluated the coupling between the listener's cortical activity and the temporal envelopes of different sounds in a multitalker auditory scene using magnetoencephalography and corticovocal coherence analysis. Neuromagnetic signals were recorded from 20 right-handed healthy adult humans who listened to five different recorded stories (attended speech streams), one without any multitalker background (No noise) and four mixed with a “cocktail party” multitalker background noise at four signal-to-noise ratios (5, 0, −5, and −10 dB) to produce speech-in-noise mixtures, here referred to as Global scene. Coherence analysis revealed that the modulations of the attended speech stream, presented without multitalker background, were coupled at ∼0.5 Hz to the activity of both superior temporal gyri, whereas the modulations at 4–8 Hz were coupled to the activity of the right supratemporal auditory cortex. In cocktail party conditions, with the multitalker background noise, the coupling was at both frequencies stronger for the attended speech stream than for the unattended Multitalker background. The coupling strengths decreased as the Multitalker background increased. During the cocktail party conditions, the ∼0.5 Hz coupling became left-hemisphere dominant, compared with bilateral coupling without the multitalker background, whereas the 4–8 Hz coupling remained right-hemisphere lateralized in both conditions. The brain activity was not coupled to the multitalker background or to its individual talkers. The results highlight the key role of listener's left superior temporal gyri in extracting the slow ∼0.5 Hz modulations, likely reflecting the attended speech stream within a multitalker auditory scene. SIGNIFICANCE STATEMENT When people listen to one person in a “cocktail party,” their auditory cortex mainly follows the attended speech stream rather than the entire auditory scene. However, how the brain extracts the attended speech stream from the whole auditory scene and how increasing background noise corrupts this process is still debated. In this magnetoencephalography study, subjects had to attend a speech stream with or without multitalker background noise. Results argue for frequency-dependent cortical tracking mechanisms for the attended speech stream. The left superior temporal gyrus tracked the ∼0.5 Hz modulations of the attended speech stream only when the speech was embedded in multitalker background, whereas the right supratemporal auditory cortex tracked 4–8 Hz modulations during both noiseless and cocktail-party conditions.

[1]  A. Dale,et al.  Improved Localizadon of Cortical Activity by Combining EEG and MEG with MRI Cortical Surface Reconstruction: A Linear Approach , 1993, Journal of Cognitive Neuroscience.

[2]  Matthew H. Davis,et al.  Neural Oscillations Carry Speech Rhythm through to Comprehension , 2012, Front. Psychology.

[3]  David Poeppel,et al.  Cortical oscillations and speech processing: emerging computational principles and operations , 2012, Nature Neuroscience.

[4]  D. Poeppel,et al.  Phase Patterns of Neuronal Responses Reliably Discriminate Speech in Human Auditory Cortex , 2007, Neuron.

[5]  Mathieu Bourguignon,et al.  Preserved Coupling between the Reader's Voice and the Listener's Cortical Activity in Autism Spectrum Disorders , 2014, PloS one.

[6]  S. Taulu,et al.  Applications of the signal space separation method , 2005, IEEE Transactions on Signal Processing.

[7]  J. Simon,et al.  Cortical entrainment to continuous speech: functional roles and interpretations , 2014, Front. Hum. Neurosci..

[8]  Wilkin Chau,et al.  Left thalamo-cortical network implicated in successful speech separation and identification , 2005, NeuroImage.

[9]  Jonathan Z Simon,et al.  The encoding of auditory objects in auditory cortex: insights from magnetoencephalography. , 2015, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[10]  Lee M. Miller,et al.  A Multisensory Cortical Network for Understanding Speech in Noise , 2009, Journal of Cognitive Neuroscience.

[11]  A. Puce,et al.  Neuronal oscillations and visual amplification of speech , 2008, Trends in Cognitive Sciences.

[12]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[13]  Patrick Suppes,et al.  Using phase to recognize English phonemes and their distinctive features in the brain , 2012, Proceedings of the National Academy of Sciences.

[14]  Steven Greenberg,et al.  Temporal properties of spontaneous speech - a syllable-centric perspective , 2003, J. Phonetics.

[15]  M. Cooke,et al.  Consonant identification in N-talker babble is a nonmonotonic function of N. , 2005, The Journal of the Acoustical Society of America.

[16]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[17]  P Suppes,et al.  Brain wave recognition of words. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[18]  P. Schyns,et al.  Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain , 2013, PLoS biology.

[19]  C. Schroeder,et al.  Low-frequency neuronal oscillations as instruments of sensory selection , 2009, Trends in Neurosciences.

[20]  John C. Mosher,et al.  Anatomically and Functionally Constrained Minimum-Norm Estimates , 2010 .

[21]  Fanny Meunier,et al.  Phonetic and lexical interferences in informational masking during speech-in-speech comprehension , 2007, Speech Commun..

[22]  E Ahissar,et al.  Speech comprehension is correlated with temporal response patterns recorded from auditory cortex , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  J. Simon,et al.  Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. , 2012, Journal of neurophysiology.

[24]  C. Schroeder,et al.  The Spectrotemporal Filter Mechanism of Auditory Selective Attention , 2013, Neuron.

[25]  Jonathan Z. Simon,et al.  Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech , 2013, The Journal of Neuroscience.

[26]  J. Simon,et al.  Emergence of neural encoding of auditory objects while listening to competing speakers , 2012, Proceedings of the National Academy of Sciences.

[27]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[28]  Mathieu Bourguignon,et al.  Neuronal network coherent with hand kinematics during fast repetitive hand movements , 2012, NeuroImage.

[29]  Josh H. McDermott The cocktail party problem , 2009, Current Biology.

[30]  Brian C J Moore,et al.  Contribution of very low amplitude-modulation rates to intelligibility in a competing-speech task (L). , 2009, The Journal of the Acoustical Society of America.

[31]  A M Amjad,et al.  A framework for the analysis of mixed time series/point process data--theory and application to the study of physiological tremor, single motor unit discharges and electromyograms. , 1995, Progress in biophysics and molecular biology.

[32]  Evelien Carrette,et al.  Recording temporal lobe epileptic activity with MEG in a light-weight magnetic shield , 2011, Seizure.

[33]  P Suppes,et al.  Invariance between subjects of brain wave representations of language. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Pavel Sovka,et al.  Approximation of statistical distribution of magnitude squared coherence estimated with segment overlapping , 2007, Signal Process..

[35]  Joachim Gross,et al.  Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension , 2012, Cerebral cortex.

[36]  J. Ashburner,et al.  Nonlinear spatial normalization using basis functions , 1999, Human brain mapping.

[37]  Mika Seppä,et al.  Uncovering cortical MEG responses to listened audiobook stories , 2014, NeuroImage.

[38]  P Suppes,et al.  Brain-wave recognition of sentences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Riitta Hari,et al.  Corticokinematic coherence mainly reflects movement-induced proprioceptive feedback , 2015, NeuroImage.

[40]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[41]  John J. Foxe,et al.  At what time is the cocktail party? A late locus of selective attention to natural speech , 2012, The European journal of neuroscience.

[42]  D. Poeppel,et al.  Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party” , 2013, Neuron.

[43]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[44]  J. R. Rosenberg,et al.  The Fourier approach to the identification of functional coupling between neuronal spike trains. , 1989, Progress in biophysics and molecular biology.

[45]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[46]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[47]  Luca Faes,et al.  Surrogate data analysis for assessing the significance of the coherence function , 2004, IEEE Transactions on Biomedical Engineering.

[48]  Lauri Parkkonen,et al.  Recording epileptic activity with MEG in a light-weight magnetic shield , 2008, Epilepsy Research.

[49]  Mitchell Steinschneider,et al.  Neural Representation of Concurrent Harmonic Sounds in Monkey Primary Auditory Cortex: Implications for Models of Auditory Scene Analysis , 2014, The Journal of Neuroscience.

[50]  Karl J. Friston,et al.  Incorporating Prior Knowledge into Image Registration , 1997, NeuroImage.

[51]  Thomas E. Nichols,et al.  Nonparametric permutation tests for functional neuroimaging: A primer with examples , 2002, Human brain mapping.

[52]  V. Jousmäki,et al.  The pace of prosodic phrasing couples the listener's cortex to the reader's voice , 2013, Human brain mapping.

[53]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.