Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception

Humans can easily focus on one speaker in a multi-talker acoustic environment, but how different areas of the human auditory cortex (AC) represent the acoustic components of mixed speech is unknown. We obtained invasive recordings from the primary and nonprimary AC in neurosurgical patients as they listened to multi-talker speech. We found that neural sites in the primary AC responded to individual speakers in the mixture and were relatively unchanged by attention. In contrast, neural sites in the nonprimary AC were less discerning of individual speakers but selectively represented the attended speaker. Moreover, the encoding of the attended speaker in the nonprimary AC was invariant to the degree of acoustic overlap with the unattended speaker. Finally, this emergent representation of attended speech in the nonprimary AC was linearly predictable from the primary AC responses. Our results reveal the neural computations underlying the hierarchical formation of auditory objects in human AC during multi-talker speech perception.

[1]  Essa Yacoub,et al.  Encoding of Natural Sounds at Multiple Spectral and Temporal Resolutions in the Human Auditory Cortex , 2014, PLoS Comput. Biol..

[2]  J. Kaas,et al.  Architectonic identification of the core region in auditory cortex of macaques, chimpanzees, and humans , 2001, The Journal of comparative neurology.

[3]  Elia Formisano,et al.  An anatomical and functional topography of human auditory cortical areas , 2014, Front. Neurosci..

[4]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[5]  Stanislas Dehaene,et al.  Neurophysiological dynamics of phrase-structure building during sentence processing , 2017, Proceedings of the National Academy of Sciences.

[6]  L. Elliot Hong,et al.  Rapid Transformation from Auditory to Linguistic Representations of Continuous Speech , 2018, Current Biology.

[7]  Nima Mesgarani,et al.  Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  A. King,et al.  Unraveling the principles of auditory cortical processing: can we learn from the visual system? , 2009, Nature Neuroscience.

[9]  David M. Groppe,et al.  Corticocortical Evoked Potentials Reveal Projectors and Integrators in Human Brain Networks , 2014, The Journal of Neuroscience.

[10]  Jonathan Z Simon,et al.  Cortical Representations of Speech in a Multitalker Auditory Scene , 2017, The Journal of Neuroscience.

[11]  Kerry M. M. Walker,et al.  Interdependent Encoding of Pitch, Timbre, and Spatial Location in Auditory Cortex , 2009, The Journal of Neuroscience.

[12]  S. David,et al.  Influence of context and behavior on stimulus reconstruction from neural activity in primary auditory cortex. , 2009, Journal of neurophysiology.

[13]  M. Howard,et al.  Differential activation of human core, non-core and auditory-related cortex during speech categorization tasks as revealed by intracranial recordings , 2014, Front. Neurosci..

[14]  K. Sen,et al.  Spectral-temporal Receptive Fields of Nonlinear Auditory Neurons Obtained Using Natural Sounds , 2022 .

[15]  O. Salonen,et al.  Brain networks of bottom-up triggered and top-down controlled shifting of auditory attention , 2009, Brain Research.

[16]  D. Poeppel,et al.  Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party” , 2013, Neuron.

[17]  S. David,et al.  Emergent Selectivity for Task-Relevant Stimuli in Higher-Order Auditory Cortex , 2014, Neuron.

[18]  Claude Alain,et al.  Tracing the emergence of categorical speech perception in the human auditory system , 2013, NeuroImage.

[19]  Frédéric E Theunissen,et al.  The Hierarchical Cortical Organization of Human Speech Processing , 2017, The Journal of Neuroscience.

[20]  R. Goebel,et al.  High-Resolution Mapping of Myeloarchitecture In Vivo: Localization of Auditory Areas in the Human Brain. , 2015, Cerebral cortex.

[21]  S. Shamma,et al.  Temporal coherence and attention in auditory scene analysis , 2011, Trends in Neurosciences.

[22]  Stephanie Clarke,et al.  Architecture, Connectivity, and Transmitter Receptors of Human Auditory Cortex , 2012 .

[23]  Christoph E Schreiner,et al.  Spectrotemporal structure of receptive fields in areas AI and AAF of mouse auditory cortex. , 2003, Journal of neurophysiology.

[24]  Nima Mesgarani,et al.  Joint Representation of Spatial and Phonetic Features in the Human Core Auditory Cortex. , 2018, Cell reports.

[25]  Anders M. Dale,et al.  Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature , 2010, NeuroImage.

[26]  B. Shinn-Cunningham Object-based auditory and visual attention , 2008, Trends in Cognitive Sciences.

[27]  Y. Cohen,et al.  The what, where and how of auditory-object perception , 2013, Nature Reviews Neuroscience.

[28]  A. Rhone,et al.  Intracranial Electrophysiology of Auditory Selective Attention Associated with Speech Classification Tasks , 2017, Front. Hum. Neurosci..

[29]  Adrian K. C. Lee,et al.  Using neuroimaging to understand the cortical mechanisms of auditory selective attention , 2014, Hearing Research.

[30]  M. Chait,et al.  Neural Correlates of Auditory Figure-Ground Segregation Based on Temporal Coherence , 2016, Cerebral cortex.

[31]  Sridhar Krishna Nemala,et al.  Discriminant spectrotemporal features for phoneme recognition , 2009, INTERSPEECH.

[32]  G. Buzsáki,et al.  NeuroGrid: recording action potentials from the surface of the brain , 2014, Nature Neuroscience.

[33]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[34]  Nancy Kanwisher,et al.  Neural correlate of the construction of sentence meaning , 2016, Proceedings of the National Academy of Sciences.

[35]  Edmund C. Lalor,et al.  Electrophysiological Correlates of Semantic Dissimilarity Reflect the Comprehension of Natural, Narrative Speech , 2017, Current Biology.

[36]  Nima Mesgarani,et al.  Phoneme representation and classification in primary auditory cortex. , 2008, The Journal of the Acoustical Society of America.

[37]  J. Rauschecker,et al.  Multiple stages of auditory speech perception reflected in event-related FMRI. , 2007, Cerebral cortex.

[38]  J. Rauschecker,et al.  Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing , 2009, Nature Neuroscience.

[39]  Bahar Khalighinejad,et al.  NAPLib: An open source toolbox for real-time and offline Neural Acoustic Processing , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Shihab A Shamma,et al.  Task reward structure shapes rapid receptive field plasticity in auditory cortex , 2012, Proceedings of the National Academy of Sciences.

[41]  Bahar Khalighinejad,et al.  Adaptation of the human auditory cortex to changing background noise , 2019, Nature Communications.

[42]  Erik Edwards,et al.  A Spatial Map of Onset and Sustained Responses to Speech in the Human Superior Temporal Gyrus , 2018, Current Biology.

[43]  Lee M. Miller,et al.  Functional Convergence of Response Properties in the Auditory Thalamocortical System , 2001, Neuron.

[44]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[45]  John J. Foxe,et al.  At what time is the cocktail party? A late locus of selective attention to natural speech , 2012, The European journal of neuroscience.

[46]  Stephen V David,et al.  Rapid Task-Related Plasticity of Spectrotemporal Receptive Fields in the Auditory Midbrain , 2015, The Journal of Neuroscience.

[47]  A. Dale,et al.  Cortical Surface-Based Analysis II: Inflation, Flattening, and a Surface-Based Coordinate System , 1999, NeuroImage.

[48]  David Poeppel,et al.  Spectrotemporal modulation provides a unifying framework for auditory cortical asymmetries , 2019, Nature Human Behaviour.

[49]  E. Formisano,et al.  Auditory Cortex Encodes the Perceptual Interpretation of Ambiguous Sound , 2011, The Journal of Neuroscience.

[50]  Xenophon Papademetris,et al.  BioImage Suite: An integrated medical image analysis suite: An update. , 2006, The insight journal.

[51]  P. Morosan,et al.  Human Primary Auditory Cortex: Cytoarchitectonic Subdivisions and Mapping into a Spatial Reference System , 2001, NeuroImage.

[52]  E. Yund,et al.  Attentional modulation of human auditory cortex , 2004, Nature Neuroscience.

[53]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[54]  Zhuo Chen,et al.  Neural decoding of attentional selection in multi-speaker environments without access to clean sources , 2017, Journal of neural engineering.

[55]  Nima Mesgarani,et al.  Speaker-independent auditory attention decoding without access to clean speech sources , 2019, Science Advances.

[56]  A. Galaburda,et al.  Topographical variation of the human primary cortices: implications for neuroimaging, brain mapping, and neurobiology. , 1993, Cerebral cortex.

[57]  Alain de Cheveigné,et al.  Low-frequency cortical responses to natural speech reflect probabilistic phonotactics , 2019, NeuroImage.

[58]  Kerry M. M. Walker,et al.  Auditory Cortex Represents Both Pitch Judgments and the Corresponding Acoustic Cues , 2013, Current Biology.

[59]  Antoine J. Shahin,et al.  Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party” , 2010, The Journal of Neuroscience.

[60]  Kerry M. M. Walker,et al.  Multiplexed and Robust Representations of Sound Features in Auditory Cortex , 2011, The Journal of Neuroscience.

[61]  Anne-Lise Giraud,et al.  The contribution of frequency-specific activity to hierarchical information processing in the human auditory cortex , 2014, Nature Communications.

[62]  Mitchell Steinschneider,et al.  Modulation of response patterns in human auditory cortex during a target detection task: an intracranial electrophysiology study. , 2015, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[63]  E. Chang,et al.  Categorical Speech Representation in Human Superior Temporal Gyrus , 2010, Nature Neuroscience.

[64]  Kirill V. Nourski,et al.  Differential responses to spectrally degraded speech within human auditory cortex: An intracranial electrophysiology study , 2019, Hearing Research.

[65]  Mounya Elhilali,et al.  Segregating Complex Sound Sources through Temporal Coherence , 2014, PLoS Comput. Biol..

[66]  Nima Mesgarani,et al.  A computational model of rapid task-related plasticity of auditory cortical receptive fields , 2010, Journal of Computational Neuroscience.

[67]  Kirill V. Nourski,et al.  Representation of speech in human auditory cortex: Is it special? , 2013, Hearing Research.

[68]  Troy A Hackett,et al.  Anatomical organization of the auditory cortex. , 2008, Journal of the American Academy of Audiology.

[69]  S. Shamma,et al.  Interaction between Attention and Bottom-Up Saliency Mediates the Representation of Foreground and Background in an Auditory Scene , 2009, PLoS biology.

[70]  Joseph R. Madsen,et al.  Individualized localization and cortical surface-based registration of intracranial electrodes , 2012, NeuroImage.

[71]  Kirill V Nourski,et al.  Auditory processing in the human cortex: An intracranial electrophysiology perspective , 2017, Laryngoscope investigative otolaryngology.

[72]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[73]  Nima Mesgarani,et al.  Speaker-Independent Speech Separation With Deep Attractor Network , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[74]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[75]  James J. Gibson,et al.  The Ecological Approach to Visual Perception: Classic Edition , 2014 .

[76]  S. Shamma On the Emergence and Awareness of Auditory Objects , 2008, PLoS biology.

[77]  J. Simon,et al.  Emergence of neural encoding of auditory objects while listening to competing speakers , 2012, Proceedings of the National Academy of Sciences.

[78]  Kristofer E. Bouchard,et al.  Functional Organization of Human Sensorimotor Cortex for Speech Articulation , 2013, Nature.

[79]  Mounya Elhilali,et al.  Task Difficulty and Performance Induce Diverse Adaptive Patterns in Gain and Shape of Primary Auditory Cortical Receptive Fields , 2009, Neuron.

[80]  Shihab A. Shamma,et al.  Sound stream segregation: a neuromorphic approach to solve the “cocktail party problem” in real-time , 2015, Front. Neurosci..

[81]  Matthew K. Leonard,et al.  Dynamic Encoding of Speech Sequence Probability in Human Temporal Cortex , 2015, The Journal of Neuroscience.

[82]  Keith Johnson,et al.  Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[83]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[84]  Sylvain Baillet,et al.  Musicians at the Cocktail Party: Neural Substrates of Musical Training During Selective Listening in Multispeaker Situations. , 2018, Cerebral cortex.

[85]  Nikos Makris,et al.  Automatically parcellating the human cerebral cortex. , 2004, Cerebral cortex.

[86]  M. Mesulam,et al.  Remapping attentional priorities: differential contribution of superior parietal lobule and intraparietal sulcus. , 2007, Cerebral cortex.

[87]  Lee M. Miller,et al.  Auditory attentional control and selection during cocktail party listening. , 2010, Cerebral cortex.

[88]  J. Fritz,et al.  Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex , 2003, Nature Neuroscience.

[89]  J. Rauschecker Processing of complex sounds in the auditory cortex of cat, monkey, and man. , 1997, Acta oto-laryngologica. Supplementum.

[90]  J. Rauschecker,et al.  Cortical Representation of Natural Complex Sounds: Effects of Acoustic Features and Auditory Object Category , 2010, The Journal of Neuroscience.

[91]  James A. O'Sullivan,et al.  Evidence for Neural Computations of Temporal Coherence in an Auditory Scene and Their Enhancement during Active Listening , 2015, The Journal of Neuroscience.

[92]  Matthew K. Leonard,et al.  Perceptual restoration of masked speech in human cortex , 2016, Nature Communications.

[93]  Lee M. Miller,et al.  Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. , 2002, Journal of neurophysiology.

[94]  Richard R. Fay,et al.  The Mammalian Auditory Pathway: Neuroanatomy , 1992, Springer Handbook of Auditory Research.

[95]  Bahar Khalighinejad,et al.  Towards reconstructing intelligible speech from the human auditory cortex , 2019, Scientific Reports.