Prior Knowledge Guides Speech Segregation in Human Auditory Cortex

Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.

[1]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[2]  S. Shamma On the role of space and time in auditory processing , 2001, Trends in Cognitive Sciences.

[3]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[4]  D. Poeppel,et al.  Cortical Tracking of Hierarchical Linguistic Structures in Connected Speech , 2015, Nature Neuroscience.

[5]  D. Broadbent A mechanical model for human attention and immediate memory. , 1957, Psychological review.

[6]  C Pantev,et al.  A high-precision magnetoencephalographic study of human auditory steady-state responses to amplitude-modulated tones. , 2000, The Journal of the Acoustical Society of America.

[7]  F. Bloom,et al.  Modulation of early sensory processing in human auditory cortex during auditory selective attention. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Eero P. Simoncelli,et al.  Summary statistics in auditory perception , 2013, Nature Neuroscience.

[9]  P. Schyns,et al.  Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain , 2013, PLoS biology.

[10]  Ying-Yee Kong,et al.  Differential modulation of auditory responses to attended and unattended speech in different listening conditions , 2014, Hearing Research.

[11]  Jonathan Z. Simon,et al.  Denoising based on spatial filtering , 2008, Journal of Neuroscience Methods.

[12]  R. Freyman,et al.  The role of visual speech cues in reducing energetic and informational masking. , 2005, The Journal of the Acoustical Society of America.

[13]  S. Shamma,et al.  Interaction between Attention and Bottom-Up Saliency Mediates the Representation of Foreground and Background in an Auditory Scene , 2009, PLoS biology.

[14]  Ramesh Srinivasan,et al.  Suppression of competing speech through entrainment of cortical oscillations. , 2013, Journal of neurophysiology.

[15]  S. Taulu,et al.  Spatiotemporal signal space separation method for rejecting nearby interference in MEG measurements , 2006, Physics in medicine and biology.

[16]  John J. Foxe,et al.  Attentional Selection in a Cocktail Party Environment Can Be Decoded from Single-Trial EEG. , 2015, Cerebral cortex.

[17]  J. Simon,et al.  Emergence of neural encoding of auditory objects while listening to competing speakers , 2012, Proceedings of the National Academy of Sciences.

[18]  S. Hillyard,et al.  Temporal dynamics of selective attention during dichotic listening. , 2009, Cerebral cortex.

[19]  C. Koch Strategies and Models of Selective Attention , 2010 .

[20]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[21]  Ying-Yee Kong,et al.  Effects of Spectral Degradation on Attentional Modulation of Cortical Auditory Responses to Continuous Speech , 2015, Journal of the Association for Research in Otolaryngology.

[22]  Maneesh Sahani,et al.  Prior context in audition informs binding and shapes simple features , 2017, Nature Communications.

[23]  Riitta Hari,et al.  Selective listening modifies activity of the human auditory cortex , 2004, Experimental Brain Research.

[24]  Mika Seppä,et al.  Uncovering cortical MEG responses to listened audiobook stories , 2014, NeuroImage.

[25]  Melissa K. Gregg,et al.  Attention, Awareness, and the Perception of Auditory Scenes , 2011, Front. Psychology.

[26]  Terence W. Picton,et al.  Effects of Attention on Neuroelectric Correlates of Auditory Stream Segregation , 2006, Journal of Cognitive Neuroscience.

[27]  J. Simon,et al.  Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. , 2012, Journal of neurophysiology.

[28]  I. Dean,et al.  Neural population coding of sound level adapts to stimulus statistics , 2005, Nature Neuroscience.

[29]  Bruce Fischl,et al.  FreeSurfer , 2012, NeuroImage.

[30]  E. Halgren,et al.  Dynamic Statistical Parametric Mapping Combining fMRI and MEG for High-Resolution Imaging of Cortical Activity , 2000, Neuron.

[31]  Virginia Best,et al.  The role of syntax in maintaining the integrity of streams of speech. , 2014, The Journal of the Acoustical Society of America.

[32]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[33]  S. Hillyard,et al.  Electrical Signs of Selective Attention in the Human Brain , 1973, Science.

[34]  John J. Foxe,et al.  At what time is the cocktail party? A late locus of selective attention to natural speech , 2012, The European journal of neuroscience.

[35]  Tobias Reichenbach,et al.  The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention , 2017, bioRxiv.

[36]  K. Grill-Spector,et al.  Repetition and the brain: neural models of stimulus-specific effects , 2006, Trends in Cognitive Sciences.

[37]  Mounya Elhilali,et al.  Modelling auditory attention , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[38]  Ming Yan,et al.  Prosodic boundaries delay the processing of upcoming lexical information during silent sentence reading. , 2013, Journal of experimental psychology. Learning, memory, and cognition.

[39]  D. Poeppel,et al.  Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party” , 2013, Neuron.

[40]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[41]  Seppo P. Ahlfors,et al.  Assessing and improving the spatial accuracy in MEG source localization by depth-weighted minimum-norm estimates , 2006, NeuroImage.

[42]  Virginia Best,et al.  Visually-guided Attention Enhances Target Identification in a Complex Auditory Scene , 2007, Journal for the Association for Research in Otolaryngology.

[43]  B. Shinn-Cunningham Object-based auditory and visual attention , 2008, Trends in Cognitive Sciences.

[44]  A. Dale,et al.  Distributed current estimates using cortical orientation constraints , 2006, Human brain mapping.

[45]  T. Picton,et al.  Human Cortical Responses to the Speech Envelope , 2008, Ear and hearing.

[46]  Richard L Freyman,et al.  Effect of number of masking talkers and auditory priming on informational masking in speech recognition. , 2004, The Journal of the Acoustical Society of America.

[47]  Kirill V. Nourski,et al.  Representation of speech in human auditory cortex: Is it special? , 2013, Hearing Research.

[48]  D. Poeppel,et al.  Phase Patterns of Neuronal Responses Reliably Discriminate Speech in Human Auditory Cortex , 2007, Neuron.

[49]  Antoine J. Shahin,et al.  Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party” , 2010, The Journal of Neuroscience.

[50]  Kai Lu,et al.  Temporal coherence structure rapidly shapes neuronal interactions , 2017, Nature Communications.

[51]  Christopher K. Kovach,et al.  Temporal Envelope of Time-Compressed Speech Represented in the Human Auditory Cortex , 2009, The Journal of Neuroscience.

[52]  Jonathan Z. Simon,et al.  Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech , 2013, The Journal of Neuroscience.

[53]  Garreth Prendergast,et al.  The Role of Phase-locking to the Temporal Envelope of Speech in Auditory Perception and Speech Intelligibility , 2015, Journal of Cognitive Neuroscience.

[54]  John J. Foxe,et al.  Neural responses to uninterrupted natural speech can be extracted with precise temporal resolution , 2010, The European journal of neuroscience.

[55]  Wen Zhang,et al.  Time-domain analysis of neural tracking of hierarchical linguistic structures , 2017, NeuroImage.

[56]  R. Freyman,et al.  Effect of Priming on Energetic and Informational Masking in a Same–Different Task , 2012, Ear and hearing.

[57]  G. A. Miller,et al.  The intelligibility of speech as a function of the context of the test materials. , 1951, Journal of experimental psychology.

[58]  Martin Luessi,et al.  MNE software for processing MEG and EEG data , 2014, NeuroImage.

[59]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[60]  Virginia Best,et al.  Auditory Object Formation and Selection , 2017 .

[61]  Stephen V David,et al.  Rapid Task-Related Plasticity of Spectrotemporal Receptive Fields in the Auditory Midbrain , 2015, The Journal of Neuroscience.

[62]  R. M. Warren Perceptual Restoration of Missing Speech Sounds , 1970, Science.

[63]  Qiang Huang,et al.  The effect of voice cuing on releasing Chinese speech from informational masking , 2007, Speech Commun..

[64]  Joachim Gross,et al.  Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension , 2012, Cerebral cortex.