The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts

Speech contains temporal structure that the brain must analyze to enable linguistic processing. To investigate the neural basis of this analysis, we used sound quilts, stimuli constructed by shuffling segments of a natural sound, approximately preserving its properties on short timescales while disrupting them on longer scales. We generated quilts from foreign speech to eliminate language cues and manipulated the extent of natural acoustic structure by varying the segment length. Using functional magnetic resonance imaging, we identified bilateral regions of the superior temporal sulcus (STS) whose responses varied with segment length. This effect was absent in primary auditory cortex and did not occur for quilts made from other natural sounds or acoustically matched synthetic sounds, suggesting tuning to speech-specific spectrotemporal structure. When examined parametrically, the STS response increased with segment length up to ∼500 ms. Our results identify a locus of speech analysis in human auditory cortex that is distinct from lexical, semantic or syntactic processes.

[1]  Alan C. Evans,et al.  Quantifying variability in the planum temporale: a probability map. , 1999, Cerebral cortex.

[2]  R. Voss,et al.  ‘1/fnoise’ in music and speech , 1975, Nature.

[3]  J. Rauschecker,et al.  Multiple stages of auditory speech perception reflected in event-related FMRI. , 2007, Cerebral cortex.

[4]  E. T. Possing,et al.  Human temporal lobe activation by speech and nonspeech sounds. , 2000, Cerebral cortex.

[5]  J. Rauschecker,et al.  Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing , 2009, Nature Neuroscience.

[6]  D. Poeppel,et al.  Speech perception at the interface of neurobiology and linguistics , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[7]  I. Johnsrude,et al.  Spectral and temporal processing in human auditory cortex. , 2002, Cerebral cortex.

[8]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[9]  R. Zatorre,et al.  Voice-selective areas in human auditory cortex , 2000, Nature.

[10]  Ellen F. Lau,et al.  A cortical network for semantics: (de)constructing the N400 , 2008, Nature Reviews Neuroscience.

[11]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[12]  S. Scott,et al.  Identification of a pathway for intelligible speech in the left temporal lobe. , 2000, Brain : a journal of neurology.

[13]  N. Kanwisher,et al.  New method for fMRI investigations of language: defining ROIs functionally in individual subjects. , 2010, Journal of neurophysiology.

[14]  Jonas Obleser,et al.  Bilateral Speech Comprehension Reflects Differential Sensitivity to Spectral and Temporal Features , 2008, The Journal of Neuroscience.

[15]  C. Price,et al.  Speech-specific auditory processing: where is it? , 2005, Trends in Cognitive Sciences.

[16]  N. Logothetis,et al.  Where Are the Human Speech and Voice Regions, and Do Other Animals Have Anything Like Them? , 2009, The Neuroscientist : a review journal bringing neurobiology, neurology and psychiatry.

[17]  Ingrid S. Johnsrude,et al.  Human auditory cortex is sensitive to the perceived clarity of speech , 2012, NeuroImage.

[18]  S. Dehaene,et al.  Cortical representation of the constituent structure of sentences , 2011, Proceedings of the National Academy of Sciences.

[19]  S. Edelman,et al.  Human Brain Mapping 6:316–328(1998) � A Sequence of Object-Processing Stages Revealed by fMRI in the Human Occipital Lobe , 2022 .

[20]  Jean-Luc Anton,et al.  Region of interest analysis using an SPM toolbox , 2010 .

[21]  Irina S. Sigalovsky,et al.  Short-term sound temporal envelope characteristics determine multisecond time patterns of activity in human auditory cortex as shown by fMRI. , 2005, Journal of neurophysiology.

[22]  Gary H Glover,et al.  Estimating sample size in functional MRI (fMRI) neuroimaging studies: Statistical power analyses , 2002, Journal of Neuroscience Methods.

[23]  Colin Humphries,et al.  The functional organization of the left STS: a large scale meta-analysis of PET and fMRI studies of healthy adults , 2014, Front. Neurosci..

[24]  Didier Grandjean,et al.  On the spatial organization of sound processing in the human temporal lobe: A meta-analysis , 2012, NeuroImage.

[25]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[26]  V. Menon,et al.  Decoding temporal structure in music and speech relies on shared brain resources but elicits different fine-scale spatial patterns. , 2011, Cerebral cortex.

[27]  Steven Greenberg,et al.  A Multi-Tier Framework for Understanding Spoken Language , 2012 .

[28]  A. Boemio,et al.  Hierarchical and asymmetric temporal sensitivity in human auditory cortices , 2005, Nature Neuroscience.

[29]  P. Morosan,et al.  Probabilistic Mapping and Volume Measurement of Human Primary Auditory Cortex , 2001, NeuroImage.

[30]  Jonathan E. Peelle,et al.  The hemispheric lateralization of speech processing depends on what “speech” is: a hierarchical perspective , 2012, Front. Hum. Neurosci..

[31]  D. Poeppel,et al.  Sensitivity to temporal modulation rate and spectral bandwidth in the human auditory system: fMRI evidence. , 2012, Journal of neurophysiology.

[32]  Polina Golland,et al.  Discovering structure in the space of fMRI selectivity profiles , 2010, NeuroImage.

[33]  Katharina von Kriegstein,et al.  Encoding of Spectral Correlation over Time in Auditory Cortex , 2008, The Journal of Neuroscience.

[34]  R V Shannon,et al.  Speech Recognition with Primarily Temporal Cues , 1995, Science.

[35]  M. Schönwiesner,et al.  Hemispheric asymmetry for spectral and temporal processing in the human antero‐lateral auditory belt cortex , 2005, The European journal of neuroscience.

[36]  Colin Humphries,et al.  Hierarchical organization of speech perception in human auditory cortex , 2014, Front. Neurosci..

[37]  Matthew H. Davis,et al.  Hierarchical Processing in Spoken Language Comprehension , 2003, The Journal of Neuroscience.

[38]  A. Turken,et al.  The Neural Architecture of the Language Comprehension Network: Converging Evidence from Lesion and Connectivity Analyses , 2011, Front. Syst. Neurosci..

[39]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[40]  Bijan Pesaran,et al.  Sensory-motor transformations for speech occur bilaterally , 2014, Nature.

[41]  C. Honey,et al.  Topographic Mapping of a Hierarchy of Temporal Receptive Windows Using a Narrated Story , 2011, The Journal of Neuroscience.

[42]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[43]  D. V. von Cramon,et al.  FMRI reveals brain regions mediating slow prosodic modulations in spoken sentences , 2002, Human brain mapping.

[44]  P Sterzer,et al.  Contributions of sensory input, auditory search and verbal comprehension to cortical activity during speech processing. , 2004, Cerebral cortex.

[45]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[46]  Josh H. McDermott,et al.  Cortical Pitch Regions in Humans Respond Primarily to Resolved Harmonics and Are Located in Specific Tonotopic Regions of Anterior Auditory Cortex , 2013, The Journal of Neuroscience.

[47]  Keith Johnson,et al.  Phonetic Feature Encoding in Human Superior Temporal Gyrus , 2014, Science.

[48]  Hagai Attias,et al.  Temporal Low-Order Statistics of Natural Sounds , 1996, NIPS.

[49]  N. Kanwisher,et al.  The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception , 1997, The Journal of Neuroscience.

[50]  Richard S. J. Frackowiak,et al.  Representation of the temporal envelope of sounds in the human brain. , 2000, Journal of neurophysiology.

[51]  Sophie K. Scott,et al.  An Application of Univariate and Multivariate Approaches in fMRI to Quantifying the Hemispheric Lateralization of Acoustic and Linguistic Processes , 2012, Journal of Cognitive Neuroscience.

[52]  David A. Medler,et al.  Cerebral Cortex doi:10.1093/cercor/bhi040 Cerebral Cortex Advance Access published February 9, 2005 , 2022 .

[53]  Oded Ghitza,et al.  On the Role of Theta-Driven Syllabic Parsing in Decoding Speech: Intelligibility of Speech with a Manipulated Modulation Spectrum , 2012, Front. Psychology.

[54]  Rainer Goebel,et al.  "Who" Is Saying "What"? Brain-Based Decoding of Human Voice and Speech , 2008, Science.

[55]  J. Rauschecker Cortical processing of complex sounds , 1998, Current Opinion in Neurobiology.

[56]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[57]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.