Emergence of neural encoding of auditory objects while listening to competing speakers

A visual scene is perceived in terms of visual objects. Similar ideas have been proposed for the analogous case of auditory scene analysis, although their hypothesized neural underpinnings have not yet been established. Here, we address this question by recording from subjects selectively listening to one of two competing speakers, either of different or the same sex, using magnetoencephalography. Individual neural representations are seen for the speech of the two speakers, with each being selectively phase locked to the rhythm of the corresponding speech stream and from which can be exclusively reconstructed the temporal envelope of that speech stream. The neural representation of the attended speech dominates responses (with latency near 100 ms) in posterior auditory cortex. Furthermore, when the intensity of the attended and background speakers is separately varied over an 8-dB range, the neural representation of the attended speech adapts only to the intensity of that speaker but not to the intensity of the background speaker, suggesting an object-level intensity gain control. In summary, these results indicate that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.

[1]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[2]  S. Hillyard,et al.  Electrical Signs of Selective Attention in the Human Brain , 1973, Science.

[3]  S. Hillyard,et al.  Event-related brain potentials reveal similar attentional mechanisms during selective listening and shadowing. , 1984, Journal of experimental psychology. Human perception and performance.

[4]  R. Plomp,et al.  Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. , 1990, The Journal of the Acoustical Society of America.

[5]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[6]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[7]  F. Bloom,et al.  Modulation of early sensory processing in human auditory cortex during auditory selective attention. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[9]  Bernd Lütkenhöner,et al.  High-Precision Neuromagnetic Study of the Functional Organization of the Human Auditory Cortex , 1998, Audiology and Neurotology.

[10]  R. Salmelin,et al.  Global optimization in the localization of neuromagnetic sources , 1998, IEEE Transactions on Biomedical Engineering.

[11]  M. Merzenich,et al.  Optimizing sound features for cortical neurons. , 1998, Science.

[12]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[13]  E Ahissar,et al.  Speech comprehension is correlated with temporal response patterns recorded from auditory cortex , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[15]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[16]  T. Griffiths,et al.  The planum temporale as a computational hub , 2002, Trends in Neurosciences.

[17]  R. Zatorre,et al.  Where is 'where' in the human auditory cortex? , 2002, Nature Neuroscience.

[18]  J. Fritz,et al.  Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex , 2003, Nature Neuroscience.

[19]  R. Leahy,et al.  Equivalence of linear approaches in bioelectromagnetic inverse solutions , 2004, IEEE Workshop on Statistical Signal Processing, 2003.

[20]  Matthew H. Davis,et al.  Hierarchical Processing in Spoken Language Comprehension , 2003, The Journal of Neuroscience.

[21]  Stuart Rosen,et al.  A positron emission tomography study of the neural basis of informational and energetic masking effects in speech perception. , 2004, The Journal of the Acoustical Society of America.

[22]  T. Griffiths,et al.  What is an auditory object? , 2004, Nature Reviews Neuroscience.

[23]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[24]  Alain de Cheveigné,et al.  Denoising based on time-shift PCA , 2007, Journal of Neuroscience Methods.

[25]  Jonathan Z. Simon,et al.  Denoising based on time-shift PCA , 2007, Journal of Neuroscience Methods.

[26]  S. David,et al.  Estimating sparse spectro-temporal receptive fields with natural stimuli , 2007, Network.

[27]  Eric D Young,et al.  Neural representation of spectral and temporal information in speech , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[28]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[29]  D. Poeppel,et al.  Phase Patterns of Neuronal Responses Reliably Discriminate Speech in Human Auditory Cortex , 2007, Neuron.

[30]  Stanley Sheft,et al.  Envelope Processing and Sound-Source Perception , 2008 .

[31]  J. Gallant,et al.  Identifying natural images from human brain activity , 2008, Nature.

[32]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[33]  I. Nelken,et al.  Neurons and Objects: The Case of Auditory Cortex , 2008, Front. Neurosci..

[34]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006, IEEE Trans. Neural Networks.

[35]  B. Shinn-Cunningham Object-based auditory and visual attention , 2008, Trends in Cognitive Sciences.

[36]  Jonathan Z. Simon,et al.  Denoising based on spatial filtering , 2008, Journal of Neuroscience Methods.

[37]  A. Oxenham,et al.  Neural Correlates of Auditory Perceptual Awareness under Informational Masking , 2008, PLoS biology.

[38]  D. Abrams,et al.  Right-Hemisphere Auditory Cortex Is Dominant for Coding Syllable Patterns in Speech , 2008, The Journal of Neuroscience.

[39]  Christopher K. Kovach,et al.  Temporal Envelope of Time-Compressed Speech Represented in the Human Auditory Cortex , 2009, The Journal of Neuroscience.

[40]  S. Shamma,et al.  Temporal Coherence in the Perceptual Organization and Cortical Representation of Auditory Scenes , 2009, Neuron.

[41]  C. Schroeder,et al.  Low-frequency neuronal oscillations as instruments of sensory selection , 2009, Trends in Neurosciences.

[42]  Stuart Rosen,et al.  The neural processing of masked speech: evidence for different mechanisms in the left and right temporal lobes. , 2009, The Journal of the Acoustical Society of America.

[43]  J. Rauschecker,et al.  Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing , 2009, Nature Neuroscience.

[44]  D. McAlpine,et al.  Gain control mechanisms in the auditory pathway , 2009, Current Opinion in Neurobiology.

[45]  Gregory Hickok,et al.  Auditory Spatial and Object Processing in the Human Planum Temporale: No Evidence for Selectivity , 2010, Journal of Cognitive Neuroscience.

[46]  Antoine J. Shahin,et al.  Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party” , 2010, The Journal of Neuroscience.

[47]  David Poeppel,et al.  Discrimination of speech stimuli based on neuronal response phase patterns depends on acoustics but not comprehension. , 2010, Journal of neurophysiology.

[48]  Mitchell Steinschneider,et al.  Formation of auditory streams , 2010 .

[49]  Alan R. Palmer,et al.  The Oxford Handbook of Auditory Science: The Auditory Brain , 2010 .

[50]  Mounya Elhilali,et al.  Competing Streams at the Cocktail Party: Exploring the Mechanisms of Attention and Temporal Integration , 2010, The Journal of Neuroscience.

[51]  Michael A. Akeroyd,et al.  The role of segmentation difficulties in speech-in-speech understanding in older and hearing-impaired adults. , 2010, The Journal of the Acoustical Society of America.

[52]  S. Shamma,et al.  Temporal coherence and attention in auditory scene analysis , 2011, Trends in Neurosciences.

[53]  Mikko Sams,et al.  Attention-driven auditory cortex short-term plasticity helps segregate relevant sounds from noise , 2011, Proceedings of the National Academy of Sciences.

[54]  Christo Pantev,et al.  Sound Processing Hierarchy within Human Auditory Cortex , 2011, Journal of Cognitive Neuroscience.

[55]  J. Brugge,et al.  Auditory Evoked Potentials and Their Utility in the Assessment of Complex Sound Processing , 2011 .

[56]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[57]  J. Simon,et al.  Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. , 2012, Journal of neurophysiology.

[58]  Melissa K. Gregg,et al.  Attention, Awareness, and the Perception of Auditory Scenes , 2011, Front. Psychology.

[59]  Brian N. Pasley,et al.  Reconstructing Speech from Human Auditory Cortex , 2012, PLoS biology.

[60]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[61]  D. Poeppel,et al.  Temporal context in speech processing and attentional stream selection: A behavioral and neural perspective , 2012, Brain and Language.

[62]  S. Morad,et al.  Ceramide-orchestrated signalling in cancer cells , 2012, Nature Reviews Cancer.

[63]  Samuel Kaski,et al.  Identifying fragments of natural speech from the listener's MEG signals , 2013, Human brain mapping.