Spatiotemporal Dynamics of Sound Representations reveal a Hierarchical Progression of Category Selectivity

As the human brain transforms incoming sounds, it remains unclear whether semantic meaning is assigned via distributed, domain-general architectures or specialized hierarchical streams. Here we show that the spatiotemporal progression from acoustic to semantically dominated representations is consistent with a hierarchical processing scheme. Combining magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) patterns, we found superior temporal responses beginning ~80 ms post-stimulus onset, spreading to extratemporal cortices by ~130 ms. Early acoustically-dominated representations trended systematically toward semantic category dominance over time (after ~200 ms) and space (beyond primary cortex). Semantic category representation was spatially specific: vocalizations were preferentially distinguished in temporal and frontal voice-selective regions and the fusiform face area; scene and object sounds were distinguished in parahippocampal and medial place areas. Our results are consistent with an extended auditory processing hierarchy in which acoustic representations give rise to multiple streams specialized by category, including areas typically considered visual cortex.

[1]  Thomas Serre,et al.  Reading the mind's eye: Decoding category information during mental imagery , 2010, NeuroImage.

[2]  Jung-Kyong Kim,et al.  Tactile–Auditory Shape Learning Engages the Lateral Occipital Complex , 2011, The Journal of Neuroscience.

[3]  Matthew H. Davis,et al.  Hierarchical Organization of Auditory and Motor Representations in Speech Perception: Evidence from Searchlight Similarity Analysis , 2015, Cerebral cortex.

[4]  Aude Oliva,et al.  Reliability and Generalizability of Similarity-Based Fusion of MEG and fMRI Data in Human Ventral and Dorsal Visual Streams , 2018, bioRxiv.

[5]  J. Rauschecker,et al.  Mechanisms and streams for processing of "what" and "where" in auditory cortex. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Floris P. de Lange,et al.  Author response: Differential temporal dynamics during visual imagery and perception , 2018 .

[7]  J. Rauschecker,et al.  Processing of complex sounds in the macaque nonprimary auditory cortex. , 1995, Science.

[8]  D. Lewis,et al.  Mapping auditory core, lateral belt, and parabelt cortices in the human superior temporal gyrus , 2005, The Journal of comparative neurology.

[9]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[10]  Jayaram Chandrashekar,et al.  Sequential Processing of Lexical, Grammatical, and Phonological Information Within Broca's Area , 2009 .

[11]  Nancy Kanwisher,et al.  Visual experience is not necessary for the development of face-selectivity in the lateral fusiform gyrus , 2020, Proceedings of the National Academy of Sciences.

[12]  Daniel L. K. Yamins,et al.  A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy , 2018, Neuron.

[13]  P. Heil,et al.  Parallels between timing of onset responses of single neurons in cat and of evoked magnetic fields in human auditory cortex. , 2000, Journal of neurophysiology.

[14]  Alfred Anwander,et al.  Direct Structural Connections between Voice- and Face-Recognition Areas , 2011, The Journal of Neuroscience.

[15]  Krzysztof J. Gorgolewski,et al.  The human voice areas: Spatial organization and inter-individual variability in temporal and extra-temporal cortices , 2015, NeuroImage.

[16]  Nikolaus Kriegeskorte,et al.  Rapid invariant encoding of scene layout in human OPA , 2019 .

[17]  J. Kaas,et al.  Subdivisions of auditory cortex and processing streams in primates. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Mattson Ogg,et al.  The Rapid Emergence of Auditory Object Representations in Cortex Reflect Central Acoustic Attributes , 2020, Journal of Cognitive Neuroscience.

[19]  Xenophon Papademetris,et al.  More accurate Talairach coordinates for neuroimaging using non-linear registration , 2008, NeuroImage.

[20]  C. Baker,et al.  Scene-Selectivity and Retinotopy in Medial Parietal Cortex , 2016, Front. Hum. Neurosci..

[21]  R. Zatorre,et al.  Voice-selective areas in human auditory cortex , 2000, Nature.

[22]  Josh H McDermott,et al.  Invariance to background noise as a signature of non-primary auditory cortex , 2019, Nature Communications.

[23]  Jonathan S. Cant,et al.  Feature diagnosticity and task context shape activity in human scene-selective cortex , 2016, NeuroImage.

[24]  M. Torrens Co-Planar Stereotaxic Atlas of the Human Brain—3-Dimensional Proportional System: An Approach to Cerebral Imaging, J. Talairach, P. Tournoux. Georg Thieme Verlag, New York (1988), 122 pp., 130 figs. DM 268 , 1990 .

[25]  N. Kriegeskorte,et al.  Author ' s personal copy Representational geometry : integrating cognition , computation , and the brain , 2013 .

[26]  Riitta Hari,et al.  Audiovisual Integration of Letters in the Human Brain , 2000, Neuron.

[27]  Frédéric E Theunissen,et al.  The Hierarchical Cortical Organization of Human Speech Processing , 2017, The Journal of Neuroscience.

[28]  Josh H McDermott,et al.  Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex , 2018, bioRxiv.

[29]  Ralph L. Rosnow,et al.  Essentials of Behavioral Research: Methods and Data Analysis , 1984 .

[30]  K. Grill-Spector,et al.  The dynamics of object-selective activation correlate with recognition performance in humans , 2000, Nature Neuroscience.

[31]  Nikolaus Kriegeskorte,et al.  Frontiers in Systems Neuroscience Systems Neuroscience , 2022 .

[32]  Bart Larsen,et al.  Modality-Independent Coding of Scene Categories in Prefrontal Cortex , 2018, The Journal of Neuroscience.

[33]  R. Zatorre,et al.  Structure and function of auditory cortex: music and speech , 2002, Trends in Cognitive Sciences.

[34]  L. Wiegrebe,et al.  Searching for the time constant of neural pitch extraction. , 2001, The Journal of the Acoustical Society of America.

[35]  Antonio Torralba,et al.  Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[36]  S. Lomber,et al.  Double dissociation of 'what' and 'where' processing in auditory cortex , 2008, Nature Neuroscience.

[37]  Bevil R. Conway,et al.  Divergence in the Functional Organization of Human and Macaque Auditory Cortex Revealed by fMRI Responses to Harmonic Tones , 2019, Nature Neuroscience.

[38]  José del R. Millán,et al.  Decoding Inner Speech Using Electrocorticography: Progress and Challenges Toward a Speech Prosthesis , 2018, Front. Neurosci..

[39]  Radoslaw Martin Cichy,et al.  The representational dynamics of task and object processing in humans , 2018, eLife.

[40]  Alexander Borst,et al.  How does Nature Program Neuron Types? , 2008, Front. Neurosci..

[41]  P. Morosan,et al.  Human Primary Auditory Cortex: Cytoarchitectonic Subdivisions and Mapping into a Spatial Reference System , 2001, NeuroImage.

[42]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[43]  Hans P. Op de Beeck,et al.  Development of visual category selectivity in ventral visual cortex does not require visual experience , 2017, Proceedings of the National Academy of Sciences.

[44]  Andreas Kleinschmidt,et al.  Interaction of Face and Voice Areas during Speaker Recognition , 2005, Journal of Cognitive Neuroscience.

[45]  H. Scheich,et al.  Functional magnetic resonance imaging of a human auditory cortex area involved in foreground–background decomposition , 1998, The European journal of neuroscience.

[46]  C. Schroeder,et al.  The Spectrotemporal Filter Mechanism of Auditory Selective Attention , 2013, Neuron.

[47]  Dimitrios Pantazis,et al.  Ultra-Rapid serial visual presentation reveals dynamics of feedforward and feedback processes in the ventral visual pathway , 2018, bioRxiv.

[48]  R. Patterson,et al.  The lower limit of melodic pitch. , 2001, The Journal of the Acoustical Society of America.

[49]  T. Griffiths,et al.  What is an auditory object? , 2004, Nature Reviews Neuroscience.

[50]  Micah M. Murray,et al.  Rapid Brain Discrimination of Sounds of Objects , 2006, The Journal of Neuroscience.

[51]  Lars Muckli,et al.  Decoding Sound and Imagery Content in Early Visual Cortex , 2014, Current Biology.

[52]  V Salmela,et al.  Spatiotemporal Dynamics of Attention Networks Revealed by Representational Similarity Analysis of EEG and fMRI , 2016, Cerebral cortex.

[53]  Y. Cohen,et al.  The what, where and how of auditory-object perception , 2013, Nature Reviews Neuroscience.

[54]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[55]  G. Yovel,et al.  A unified coding strategy for processing faces and voices , 2013, Trends in Cognitive Sciences.

[56]  Fraser W. Smith,et al.  Decoding Visual Object Categories in Early Somatosensory Cortex , 2013, Cerebral cortex.

[57]  J. Rauschecker,et al.  Attention‐related modulation of activity in primary and secondary auditory cortex , 1997, Neuroreport.

[58]  Pascal Belin,et al.  Electrophysiological evidence for an early processing of human voices , 2009, BMC Neuroscience.

[59]  J. Kaas,et al.  Auditory processing in primate cerebral cortex , 1999, Current Opinion in Neurobiology.

[60]  Noël Staeren,et al.  Sound Categories Are Represented as Distributed Patterns in the Human Auditory Cortex , 2009, Current Biology.

[61]  Floris P. de Lange,et al.  Differential temporal dynamics during visual imagery and perception , 2017, bioRxiv.

[62]  Nikolaus Kriegeskorte,et al.  The spatiotemporal neural dynamics underlying perceived similarity for real-world objects , 2019, NeuroImage.

[63]  S. Kosslyn,et al.  Topographical representations of mental images in primary visual cortex , 1995, Nature.

[64]  R. Malach,et al.  Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Anders M. Dale,et al.  An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest , 2006, NeuroImage.

[66]  Matthew K. Leonard,et al.  The Encoding of Speech Sounds in the Superior Temporal Gyrus , 2019, Neuron.

[67]  Rainer Goebel,et al.  Information-based functional brain mapping. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Dimitrios Pantazis,et al.  Tracking the Spatiotemporal Neural Dynamics of Real-world Object Size and Animacy in the Human Brain , 2018, Journal of Cognitive Neuroscience.

[69]  J. Rauschecker,et al.  Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing , 2009, Nature Neuroscience.

[70]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[71]  Nancy Kanwisher,et al.  A cortical representation of the local visual environment , 1998, Nature.

[72]  Josh H McDermott,et al.  Statistics of natural reverberation enable perceptual separation of sound and space , 2016, Proceedings of the National Academy of Sciences.

[73]  G. Rees,et al.  Predicting the orientation of invisible stimuli from activity in human primary visual cortex , 2005, Nature Neuroscience.

[74]  S. Edelman,et al.  Differential Processing of Objects under Various Viewing Conditions in the Human Lateral Occipital Complex , 1999, Neuron.

[75]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[76]  Jonathan H. Venezia,et al.  Hierarchical organization of human auditory cortex: evidence from acoustic invariance in the response to intelligible speech. , 2010, Cerebral cortex.

[77]  J. Kaas,et al.  Architectonic identification of the core region in auditory cortex of macaques, chimpanzees, and humans , 2001, The Journal of comparative neurology.

[78]  Radoslaw Martin Cichy,et al.  Resolving the neural dynamics of visual and auditory scene processing in the human brain: a methodological approach , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[79]  Stephen M. Kosslyn,et al.  Mental imagery: against the nihilistic hypothesis , 2003, Trends in Cognitive Sciences.

[80]  Dimitrios Pantazis,et al.  Similarity-Based Fusion of MEG and fMRI Reveals Spatio-Temporal Dynamics in Human Cortex During Visual Object Recognition , 2015, bioRxiv.

[81]  Russell A. Epstein,et al.  Constructing scenes from objects in human occipitotemporal cortex , 2011, Nature Neuroscience.

[82]  T. Raij Patterns of Brain Activity during Visual Imagery of Letters , 1999, Journal of Cognitive Neuroscience.

[83]  N. Kanwisher,et al.  The fusiform face area subserves face perception, not generic within-category identification , 2004, Nature Neuroscience.

[84]  Verena R. Sommer,et al.  Hearing Scenes: A Neuromagnetic Signature of Auditory Source and Reverberant Space Separation , 2017, eNeuro.

[85]  A. King,et al.  Unraveling the principles of auditory cortical processing: can we learn from the visual system? , 2009, Nature Neuroscience.

[86]  Julie E. Elie,et al.  Neural processing of natural sounds , 2014, Nature Reviews Neuroscience.

[87]  Kirill V Nourski,et al.  Auditory processing in the human cortex: An intracranial electrophysiology perspective , 2017, Laryngoscope investigative otolaryngology.

[88]  Richard M. Leahy,et al.  Brainstorm: A User-Friendly Application for MEG/EEG Analysis , 2011, Comput. Intell. Neurosci..

[89]  Radoslaw Martin Cichy,et al.  Resolving human object recognition in space and time , 2014, Nature Neuroscience.

[90]  T. Hackett Information flow in the auditory cortical network , 2011, Hearing Research.

[91]  David Poeppel,et al.  The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts , 2015, Nature Neuroscience.

[92]  M. Howard,et al.  Differential activation of human core, non-core and auditory-related cortex during speech categorization tasks as revealed by intracranial recordings , 2014, Front. Neurosci..

[93]  S. Shamma,et al.  Temporal coherence and attention in auditory scene analysis , 2011, Trends in Neurosciences.

[94]  Essa Yacoub,et al.  Reconstructing the spectrotemporal modulations of real-life sounds from fMRI response patterns , 2017, Proceedings of the National Academy of Sciences.

[95]  David Whitney,et al.  The Emergence of Perceived Position in the Visual System , 2011, Journal of Cognitive Neuroscience.

[96]  Dimitrios Pantazis,et al.  Ultra-Rapid serial visual presentation reveals dynamics of feedforward and feedback processes in the ventral visual pathway , 2018, bioRxiv.

[97]  Charles E Schroeder,et al.  Timing of pure tone and noise-evoked responses in macaque auditory cortex , 2005, Neuroreport.

[98]  Josh H. McDermott,et al.  Cortical Pitch Regions in Humans Respond Primarily to Resolved Harmonics and Are Located in Specific Tonotopic Regions of Anterior Auditory Cortex , 2013, The Journal of Neuroscience.