The Hierarchical Cortical Organization of Human Speech Processing

Speech comprehension requires that the brain extract semantic meaning from the spectral features represented at the cochlea. To investigate this process, we performed an fMRI experiment in which five men and two women passively listened to several hours of natural narrative speech. We then used voxelwise modeling to predict BOLD responses based on three different feature spaces that represent the spectral, articulatory, and semantic properties of speech. The amount of variance explained by each feature space was then assessed using a separate validation dataset. Because some responses might be explained equally well by more than one feature space, we used a variance partitioning analysis to determine the fraction of the variance that was uniquely explained by each feature space. Consistent with previous studies, we found that speech comprehension involves hierarchical representations starting in primary auditory areas and moving laterally on the temporal lobe: spectral features are found in the core of A1, mixtures of spectral and articulatory in STG, mixtures of articulatory and semantic in STS, and semantic in STS and beyond. Our data also show that both hemispheres are equally and actively involved in speech perception and interpretation. Further, responses as early in the auditory hierarchy as in STS are more correlated with semantic than spectral representations. These results illustrate the importance of using natural speech in neurolinguistic research. Our methodology also provides an efficient way to simultaneously test multiple specific hypotheses about the representations of speech without using block designs and segmented or synthetic speech. SIGNIFICANCE STATEMENT To investigate the processing steps performed by the human brain to transform natural speech sound into meaningful language, we used models based on a hierarchical set of speech features to predict BOLD responses of individual voxels recorded in an fMRI experiment while subjects listened to natural speech. Both cerebral hemispheres were actively involved in speech processing in large and equal amounts. Also, the transformation from spectral features to semantic elements occurs early in the cortical speech-processing stream. Our experimental and analytical approaches are important alternatives and complements to standard approaches that use segmented speech and block designs, which report more laterality in speech processing and associated semantic processing to higher levels of cortex than reported here.

[1]  Edmund C. Lalor,et al.  Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing , 2015, Current Biology.

[2]  J. Rauschecker,et al.  Neurobiological roots of language in primate audition: common computational properties , 2015, Trends in Cognitive Sciences.

[3]  Alexander Borst,et al.  Quantifying variability in neural responses and its application for the validation of model predictions , 2004, Network.

[4]  R. Zatorre,et al.  ‘What’, ‘where’ and ‘how’ in auditory cortex , 2000, Nature Neuroscience.

[5]  Josh H. McDermott,et al.  Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition , 2015, Neuron.

[6]  Kristofer E. Bouchard,et al.  Functional Organization of Human Sensorimotor Cortex for Speech Articulation , 2013, Nature.

[7]  Keith Johnson,et al.  Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[8]  Malcolm Slaney,et al.  Lyon's Cochlear Model , 1997 .

[9]  J. Rauschecker,et al.  Cortical Representation of Natural Complex Sounds: Effects of Acoustic Features and Auditory Object Category , 2010, The Journal of Neuroscience.

[10]  Á. Pascual-Leone,et al.  Degree of language lateralization determines susceptibility to unilateral brain lesions , 2002, Nature Neuroscience.

[11]  Jack L. Gallant,et al.  A Continuous Semantic Space Describes the Representation of Thousands of Object and Action Categories across the Human Brain , 2012, Neuron.

[12]  W. Nitz,et al.  MP RAGE: a three-dimensional, T1-weighted, gradient-echo sequence--initial experience in the brain. , 1992, Radiology.

[13]  Ping Li,et al.  Common and distinct neural substrates for the perception of speech rhythm and intonation , 2010, Human brain mapping.

[14]  A. Boemio,et al.  Hierarchical and asymmetric temporal sensitivity in human auditory cortices , 2005, Nature Neuroscience.

[15]  David Poeppel,et al.  Towards a New Neurobiology of Language , 2012, The Journal of Neuroscience.

[16]  N. C. Singh,et al.  Modulation spectra of natural sounds and ethological theories of auditory processing. , 2003, The Journal of the Acoustical Society of America.

[17]  S. Scott,et al.  The neuroanatomical and functional organization of speech perception , 2003, Trends in Neurosciences.

[18]  Jeffrey R. Binder,et al.  Left Posterior Temporal Regions are Sensitive to Auditory Categorization , 2008, Journal of Cognitive Neuroscience.

[19]  Anders M. Dale,et al.  Cortical Surface-Based Analysis I. Segmentation and Surface Reconstruction , 1999, NeuroImage.

[20]  Joachim Gross,et al.  Phase-Locked Responses to Speech in Human Auditory Cortex are Enhanced During Comprehension , 2012, Cerebral cortex.

[21]  S A Shamma,et al.  Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex. , 2001, Journal of neurophysiology.

[22]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[23]  Alan C. Evans,et al.  Event-related fMRI of the auditory cortex. , 1998, NeuroImage.

[24]  D. Poeppel,et al.  Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a “Cocktail Party” , 2013, Neuron.

[25]  Ingrid S. Johnsrude,et al.  Human auditory cortex is sensitive to the perceived clarity of speech , 2012, NeuroImage.

[26]  Frédéric E Theunissen,et al.  Functional Groups in the Avian Auditory System , 2009, The Journal of Neuroscience.

[27]  Matthew H. Davis,et al.  The neural mechanisms of speech comprehension: fMRI studies of semantic ambiguity. , 2005, Cerebral cortex.

[28]  Frédéric E. Theunissen,et al.  Noise-invariant Neurons in the Avian Auditory Cortex: Hearing the Song in Noise , 2013, PLoS Comput. Biol..

[29]  J. Cumming,et al.  Where, When, and How , 2007, Research quarterly for exercise and sport.

[30]  S. David,et al.  Does attention play a role in dynamic receptive field adaptation to changing acoustic salience in A1? , 2007, Hearing Research.

[31]  Jack L. Gallant,et al.  Fourier power, subjective distance, and object categories all provide plausible models of BOLD responses in scene-selective visual areas , 2015, Front. Comput. Neurosci..

[32]  D. Abrams,et al.  Right-Hemisphere Auditory Cortex Is Dominant for Coding Syllable Patterns in Speech , 2008, The Journal of Neuroscience.

[33]  J. Rauschecker,et al.  Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing , 2009, Nature Neuroscience.

[34]  Maneesh Sahani,et al.  Evidence Optimization Techniques for Estimating Stimulus-Response Functions , 2002, NIPS.

[35]  J. Rauschecker,et al.  Phoneme and word recognition in the auditory ventral stream , 2012, Proceedings of the National Academy of Sciences.

[36]  K. Sen,et al.  Feature analysis of natural sounds in the songbird auditory forebrain. , 2001, Journal of neurophysiology.

[37]  J. Rauschecker,et al.  Functional Specialization in Rhesus Monkey Auditory Cortex , 2001, Science.

[38]  Frédéric E. Theunissen,et al.  The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..

[39]  Christoph E Schreiner,et al.  Human Superior Temporal Gyrus Organization of Spectrotemporal Modulation Tuning Derived from Speech Stimuli , 2016, The Journal of Neuroscience.

[40]  J. Rauschecker,et al.  Processing of complex sounds in the macaque nonprimary auditory cortex. , 1995, Science.

[41]  M. Iacoboni,et al.  Listening to speech activates motor areas involved in speech production , 2004, Nature Neuroscience.

[42]  J. Gallant,et al.  A Three-Dimensional Spatiotemporal Receptive Field Model Explains Responses of Area MT Neurons to Naturalistic Movies , 2011, The Journal of Neuroscience.

[43]  Stephen M. Smith,et al.  A global optimisation method for robust affine registration of brain images , 2001, Medical Image Anal..

[44]  Alexander G. Huth,et al.  Attention During Natural Vision Warps Semantic Representation Across the Human Brain , 2013, Nature Neuroscience.

[45]  Nima Mesgarani,et al.  Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  L. Stowe,et al.  Rethinking the neurological basis of language , 2005 .

[47]  J. Gallant,et al.  Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies , 2011, Current Biology.

[48]  N. Suga,et al.  Cortical neurons sensitive to combinations of information-bearing elements of biosonar signals in the mustache bat. , 1978, Science.

[49]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[50]  Brian Murphy,et al.  Simultaneously Uncovering the Patterns of Brain Regions Involved in Different Story Reading Subprocesses , 2014, PloS one.

[51]  S. Shamma,et al.  Spectro-temporal modulation transfer functions and speech intelligibility. , 1999, The Journal of the Acoustical Society of America.

[52]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[53]  C. Honey,et al.  Topographic Mapping of a Hierarchy of Temporal Receptive Windows Using a Narrated Story , 2011, The Journal of Neuroscience.

[54]  William W. Graves,et al.  Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. , 2009, Cerebral cortex.

[55]  C. Price The anatomy of language: a review of 100 fMRI studies published in 2009 , 2010, Annals of the New York Academy of Sciences.

[56]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[57]  Essa Yacoub,et al.  Encoding of Natural Sounds at Multiple Spectral and Temporal Resolutions in the Human Auditory Cortex , 2014, PLoS Comput. Biol..

[58]  J. Gallant,et al.  Predicting neuronal responses during natural vision , 2005, Network.

[59]  H. Coslett,et al.  Localization of sublexical speech perception components , 2010, Brain and Language.

[60]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[61]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[62]  Riikka Möttönen,et al.  Auditory-Motor Processing of Speech Sounds , 2012, Cerebral cortex.

[63]  Tom Michael Mitchell,et al.  A Neurosemantic Theory of Concrete Noun Representation Based on the Underlying Brain Codes , 2010, PloS one.

[64]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[65]  C. Schreiner,et al.  Nonlinear Spectrotemporal Sound Analysis by Neurons in the Auditory Midbrain , 2002, The Journal of Neuroscience.

[66]  Nancy Kanwisher,et al.  Syntactic processing in the human brain: What we know, what we don’t know, and a suggestion for how to proceed , 2011, Brain and Language.

[67]  Elizabeth Jefferies,et al.  Semantic Processing in the Anterior Temporal Lobes: A Meta-analysis of the Functional Neuroimaging Literature , 2010, Journal of Cognitive Neuroscience.

[68]  Robert T. Knight,et al.  Rapid tuning shifts in human auditory cortex enhance speech intelligibility , 2016, Nature Communications.

[69]  R. Knight,et al.  Redefining the role of Broca’s area in speech , 2015, Proceedings of the National Academy of Sciences.

[70]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[71]  Richard F. Lyon,et al.  A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.

[72]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[73]  Thane Fremouw,et al.  Sound representation methods for spectro-temporal receptive field estimation , 2006, Journal of Computational Neuroscience.

[74]  Lee M. Miller,et al.  Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. , 2002, Journal of neurophysiology.

[75]  J. Pujol,et al.  Cerebral lateralization of language in normal left-handed people studied by functional MRI , 1999, Neurology.

[76]  David Poeppel,et al.  The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts , 2015, Nature Neuroscience.

[77]  E. T. Possing,et al.  Human temporal lobe activation by speech and nonspeech sounds. , 2000, Cerebral cortex.