A neural mechanism for recognizing speech spoken by different speakers

Understanding speech from different speakers is a sophisticated process, particularly because the same acoustic parameters convey important information about both the speech message and the person speaking. How the human brain accomplishes speech recognition under such conditions is unknown. One view is that speaker information is discarded at early processing stages and not used for understanding the speech message. An alternative view is that speaker information is exploited to improve speech recognition. Consistent with the latter view, previous research identified functional interactions between the left- and the right-hemispheric superior temporal sulcus/gyrus, which process speech- and speaker-specific vocal tract parameters, respectively. Vocal tract parameters are one of the two major acoustic features that determine both speaker identity and speech message (phonemes). Here, using functional magnetic resonance imaging (fMRI), we show that a similar interaction exists for glottal fold parameters between the left and right Heschl's gyri. Glottal fold parameters are the other main acoustic feature that determines speaker identity and speech message (linguistic prosody). The findings suggest that interactions between left- and right-hemispheric areas are specific to the processing of different acoustic features of speech and speaker, and that they represent a general neural mechanism when understanding speech from different speakers.

[1]  D. Pisoni,et al.  Effects of cross-language voice training on speech perception: whose familiar voices are more intelligible? , 2011, The Journal of the Acoustical Society of America.

[2]  Karl J. Friston,et al.  Psychophysiological and Modulatory Interactions in Neuroimaging , 1997, NeuroImage.

[3]  A. Friederici,et al.  Lateralization of auditory language functions: A dynamic dual pathway model , 2004, Brain and Language.

[4]  Stefan Uppenkamp,et al.  Temporal dynamics of pitch in human auditory cortex , 2004, NeuroImage.

[5]  David B. Pisoni,et al.  Long-term memory in speech perception: Some new findings on talker variability, speaking rate and perceptual learning , 1993, Speech Commun..

[6]  P. Belin,et al.  Thinking the voice: neural correlates of voice perception , 2004, Trends in Cognitive Sciences.

[7]  Roy D. Patterson,et al.  The role of glottal pulse rate and vocal tract length in the perception of speaker identity , 2009, INTERSPEECH.

[8]  M. Sommers,et al.  The effects of talker familiarity on spoken word identification in younger and older listeners. , 2000, Psychology and aging.

[9]  Roy D. Patterson,et al.  Direct Recordings of Pitch Responses from Human Auditory Cortex , 2010, Current Biology.

[10]  D. Poeppel,et al.  The cortical organization of speech processing , 2007, Nature Reviews Neuroscience.

[11]  Yizhar Lavner,et al.  The effects of acoustic modifications on the identification of familiar voices speaking isolated vowels , 2000, Speech Commun..

[12]  M. Schönwiesner,et al.  Hemispheric asymmetry for auditory processing in the human auditory brain stem, thalamus, and cortex. , 2006, Cerebral cortex.

[13]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[14]  Roy D Patterson,et al.  The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. , 2005, The Journal of the Acoustical Society of America.

[15]  R. Patterson,et al.  The Processing of Temporal Pitch and Melody Information in Auditory Cortex , 2002, Neuron.

[16]  Timothy D Griffiths,et al.  Mapping Pitch Representation in Neural Ensembles with fMRI , 2012, The Journal of Neuroscience.

[17]  Sophie K Scott,et al.  Auditory processing — speech, space and auditory objects , 2005, Current Opinion in Neurobiology.

[18]  Bharath Chandrasekaran,et al.  Neural Processing of What and Who Information in Speech , 2011, Journal of Cognitive Neuroscience.

[19]  Karl J. Friston,et al.  Statistical parametric mapping , 2013 .

[20]  Stefan J Kiebel,et al.  How the Human Brain Recognizes Speech in the Context of Changing Speakers , 2010, The Journal of Neuroscience.

[21]  H. Scheich,et al.  Working memory specific activity in auditory cortex: potential correlates of sequential processing and maintenance. , 2007, Cerebral cortex.

[22]  J. Obleser,et al.  Pre-lexical abstraction of speech in the auditory cortex , 2009, Trends in Cognitive Sciences.

[23]  Sophie K. Scott,et al.  Cortical asymmetries in speech perception: what's wrong, what's right and what's left? , 2012, Trends in Cognitive Sciences.

[24]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Karl J. Friston,et al.  The Cortical Dynamics of Intelligible Speech , 2008, The Journal of Neuroscience.

[26]  P. Morosan,et al.  Human Primary Auditory Cortex: Cytoarchitectonic Subdivisions and Mapping into a Spatial Reference System , 2001, NeuroImage.

[27]  Alexander L. Francis,et al.  Electrophysiological Evidence for Early Interaction between Talker and Linguistic Information during Speech Perception , 2006 .

[28]  Stefanie Shattuck-Hufnagel,et al.  Question/statement judgments: An fMRI study of intonation processing , 2004, Human brain mapping.

[29]  Birger Kollmeier,et al.  Phoneme confusions in human and automatic speech recognition , 2007, INTERSPEECH.

[30]  A Quentin Summerfield,et al.  Benefits of knowing who, where, and when in multi-talker listening. , 2010, The Journal of the Acoustical Society of America.

[31]  H. Nusbaum Talker Normalization: Phonetic Constancy as a Cognitive Process , 2011 .

[32]  S. Scott,et al.  The neuroanatomical and functional organization of speech perception , 2003, Trends in Neurosciences.

[33]  C. Fougeron,et al.  How abstract phonemic categories are necessary for coping with speaker-related variation , 2010 .

[34]  Katrin Krumbholz,et al.  Understanding Pitch Perception as a Hierarchical Process with Top-Down Modulation , 2009, PLoS Comput. Biol..

[35]  S. Goldinger,et al.  Episodic encoding of voice attributes and recognition memory for spoken words. , 1993, Journal of experimental psychology. Learning, memory, and cognition.

[36]  Mario Dzemidzic,et al.  Neural circuitry underlying sentence-level linguistic prosody , 2005, NeuroImage.

[37]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[38]  Teemu Rinne,et al.  Task-Dependent Activations of Human Auditory Cortex during Pitch Discrimination and Pitch Memory Tasks , 2009, The Journal of Neuroscience.

[39]  Bernard Mazoyer,et al.  Meta-analyzing left hemisphere language areas: Phonology, semantics, and sentence processing , 2006, NeuroImage.

[40]  Elena Plante,et al.  Dissociating Sentential Prosody from Sentence Processing: Activation Interacts with Task Demands , 2002, NeuroImage.

[41]  Patrick C M Wong,et al.  Volume of left Heschl's Gyrus and linguistic pitch learning. , 2008, Cerebral cortex.

[43]  H. Nusbaum,et al.  Acoustic differences, listener expectations, and the perceptual accommodation of talker variability. , 2007, Journal of experimental psychology. Human perception and performance.

[44]  P. van Dijk,et al.  Simultaneous sampling of event‐related BOLD responses in auditory cortex and brainstem , 2002, Magnetic resonance in medicine.

[45]  Roy D. Patterson,et al.  Neural Representation of Auditory Size in the Human Voice and in Sounds from Other Resonant Sources , 2007, Current Biology.

[46]  A. Lotto,et al.  Tuned with a Tune: Talker Normalization via General Auditory Processes , 2012, Front. Psychology.

[47]  R. C. Oldfield The assessment and analysis of handedness: the Edinburgh inventory. , 1971, Neuropsychologia.

[48]  Mario Dzemidzic,et al.  A cross‐linguistic fMRI study of perception of intonation and emotion in Chinese , 2003, Human brain mapping.

[49]  Andrew J Oxenham,et al.  A Neural Representation of Pitch Salience in Nonprimary Human Auditory Cortex Revealed with Functional Magnetic Resonance Imaging , 2004, The Journal of Neuroscience.

[50]  Michael Erb,et al.  Impact of task difficulty on lateralization of pitch and duration discrimination , 2005, Neuroreport.

[51]  D. Pisoni,et al.  Recognition of spoken words by native and non-native listeners: talker-, listener-, and item-related factors. , 1999, The Journal of the Acoustical Society of America.

[52]  Marc D. Pell,et al.  Unilateral Brain Damage, Prosodic Comprehension Deficits, and the Acoustic Cues to Prosody , 1997, Brain and Language.

[53]  R. Bowtell,et al.  “sparse” temporal sampling in auditory fMRI , 1999, Human brain mapping.

[54]  D. Pisoni,et al.  Speech Perception as a Talker-Contingent Process , 1993, Psychological science.

[55]  Birger Kollmeier,et al.  Dichotic pitch activates pitch processing centre in Heschl's gyrus , 2009, NeuroImage.

[56]  André Brechmann,et al.  Hemispheric shifts of sound representation in auditory cortex with conceptual listening. , 2005, Cerebral cortex.

[57]  Roy D. Patterson,et al.  Predictive Coding and Pitch Processing in the Auditory Cortex , 2011, Journal of Cognitive Neuroscience.

[58]  D. Pisoni,et al.  Effects of talker, rate, and amplitude variation on recognition memory for spoken words , 1999, Perception & psychophysics.

[59]  R. Patterson,et al.  Encoding of the temporal regularity of sound in the human brainstem , 2001, Nature Neuroscience.

[60]  Douglas D. O'Shaughnessy,et al.  Invited paper: Automatic speech recognition: History, methods and challenges , 2008, Pattern Recognit..

[61]  Tracy L. Luks,et al.  Hemispheric Involvement in the Perception of Syntactic Prosody Is Dynamically Dependent on Task Demands , 1998, Brain and Language.

[62]  John D E Gabrieli,et al.  Assessing the influence of scanner background noise on auditory processing. I. An fMRI study comparing three experimental designs with varying degrees of scanner noise , 2007, Human brain mapping.

[63]  D. Pisoni,et al.  Talker-specific learning in speech perception , 1998, Perception & psychophysics.

[64]  Karl J. Friston,et al.  A Hierarchy of Time-Scales and the Brain , 2008, PLoS Comput. Biol..

[65]  Richard D. Morey,et al.  Confidence Intervals from Normalized Data: A correction to Cousineau (2005) , 2008 .

[66]  G. Kovács,et al.  Neural correlates of adaptation to voice identity. , 2011, British journal of psychology.

[67]  M Erb,et al.  Distinct frontal regions subserve evaluation of linguistic and emotional aspects of speech intonation. , 2004, Cerebral cortex.

[68]  Jennifer S. Pardo,et al.  AUDITORY-PHONETIC PROJECTION AND LEXICAL STRUCTURE IN THE RECOGNITION OF SINE-WAVE WORDS. , 2009, Journal of experimental psychology. Human perception and performance.

[69]  Virginia Best,et al.  Object continuity enhances selective auditory attention , 2008, Proceedings of the National Academy of Sciences.

[70]  Santiago Barreda,et al.  The direct and indirect roles of fundamental frequency in vowel perception. , 2012, The Journal of the Acoustical Society of America.

[71]  H. Nusbaum,et al.  Neural Bases of Talker Normalization , 2004, Journal of Cognitive Neuroscience.

[72]  Ashok Kumar,et al.  Strategies for improving the detection of fMRI activation in trigeminal pathways with cardiac gating , 2006, NeuroImage.