Tracing the emergence of categorical speech perception in the human auditory system

Speech perception requires the effortless mapping from smooth, seemingly continuous changes in sound features into discrete perceptual units, a conversion exemplified in the phenomenon of categorical perception. Explaining how/when the human brain performs this acoustic-phonetic transformation remains an elusive problem in current models and theories of speech perception. In previous attempts to decipher the neural basis of speech perception, it is often unclear whether the alleged brain correlates reflect an underlying percept or merely changes in neural activity that covary with parameters of the stimulus. Here, we recorded neuroelectric activity generated at both cortical and subcortical levels of the auditory pathway elicited by a speech vowel continuum whose percept varied categorically from /u/ to /a/. This integrative approach allows us to characterize how various auditory structures code, transform, and ultimately render the perception of speech material as well as dissociate brain responses reflecting changes in stimulus acoustics from those that index true internalized percepts. We find that activity from the brainstem mirrors properties of the speech waveform with remarkable fidelity, reflecting progressive changes in speech acoustics but not the discrete phonetic classes reported behaviorally. In comparison, patterns of late cortical evoked activity contain information reflecting distinct perceptual categories and predict the abstract phonetic speech boundaries heard by listeners. Our findings demonstrate a critical transformation in neural speech representations between brainstem and early auditory cortex analogous to an acoustic-phonetic mapping necessary to generate categorical speech percepts. Analytic modeling demonstrates that a simple nonlinearity accounts for the transformation between early (subcortical) brain activity and subsequent cortical/behavioral responses to speech (>150-200 ms) thereby describing a plausible mechanism by which the brain achieves its acoustic-to-phonetic mapping. Results provide evidence that the neurophysiological underpinnings of categorical speech are present cortically by ~175 ms after sound enters the ear.

[1]  Emily B. Myers,et al.  Effects of Category Learning on Neural Sensitivity to Non-native Phonetic Categories , 2012, Journal of Cognitive Neuroscience.

[2]  Philip J. Monahan,et al.  Auditory sensitivity to formant ratios: Toward an account of vowel normalisation , 2010, Language and cognitive processes.

[3]  Lars Riecke,et al.  Hearing Illusory Sounds in Noise: The Timing of Sensory-Perceptual Transformations in Auditory Cortex , 2009, Neuron.

[4]  N. Mesgarani,et al.  Selective cortical representation of attended speaker in multi-talker speech perception , 2012, Nature.

[5]  D. Pisoni,et al.  Acoustic-phonetic representations in word recognition , 1987, Cognition.

[6]  Elvira Brattico,et al.  Orderly cortical representation of vowel categories presented by multiple exemplars. , 2004, Brain research. Cognitive brain research.

[7]  David J. Freedman,et al.  Neural correlates of categories and concepts , 2003, Current Opinion in Neurobiology.

[8]  I. Pollack,et al.  On the comparison between identification and discrimination tests in speech perception , 1971 .

[9]  Christopher J. Plack,et al.  The Frequency Following Response (FFR) May Reflect Pitch-Bearing Information But is Not a Direct Representation of Pitch , 2011, Journal of the Association for Research in Otolaryngology.

[10]  C C Wood,et al.  Auditory and phonetic levels of processing in speech perception: neurophysiological and information-processing analyses. , 1975, Journal of experimental psychology. Human perception and performance.

[11]  G. Galbraith,et al.  Selective attention affects human brain stem frequency-following response , 2003, Neuroreport.

[12]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[13]  C. C. Wood,et al.  Auditory Evoked Potentials during Speech Perception , 1971, Science.

[14]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[15]  R. Carlyon How the brain separates sounds , 2004, Trends in Cognitive Sciences.

[16]  Arnaud Delorme,et al.  EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis , 2004, Journal of Neuroscience Methods.

[17]  R N Shepard,et al.  Multidimensional Scaling, Tree-Fitting, and Clustering , 1980, Science.

[18]  N Suga,et al.  The corticofugal system for hearing: recent progress. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Mathias Scharinger,et al.  A Comprehensive Three-dimensional Cortical Map of Vowel Space , 2011, Journal of Cognitive Neuroscience.

[20]  Ananthanarayan Krishnan,et al.  Human frequency-following responses: representation of steady-state synthetic vowels , 2002, Hearing Research.

[21]  D. Poeppel,et al.  Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds. , 1996, Brain research. Cognitive brain research.

[22]  D. Poeppel,et al.  Auditory Cortex Accesses Phonological Categories: An MEG Mismatch Study , 2000, Journal of Cognitive Neuroscience.

[23]  J. Arezzo,et al.  Representation of the voice onset time (VOT) speech parameter in population responses within primary auditory cortex of the awake monkey. , 2003, The Journal of the Acoustical Society of America.

[24]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[25]  Ping Li,et al.  Electrophysiological evidence of categorical perception of Chinese lexical tones in attentive condition , 2012, Neuroreport.

[26]  N. Kraus,et al.  Relationships between behavior, brainstem and cortical encoding of seen and heard speech in musicians and non-musicians , 2008, Hearing Research.

[27]  T. Elbert,et al.  Cortical representation of vowels reflects acoustic dissimilarity determined by formant frequencies. , 2003, Brain research. Cognitive brain research.

[28]  M. Carandini From circuits to behavior: a bridge too far? , 2012, Nature Neuroscience.

[29]  Alan R Palmer,et al.  Phase-locked responses to pure tones in the inferior colliculus. , 2006, Journal of neurophysiology.

[30]  Steven Greenberg,et al.  Neural temporal coding of low pitch. I. Human frequency-following responses to complex tones , 1987, Hearing Research.

[31]  T. Picton,et al.  Human auditory sustained potentials. II. Stimulus relationships. , 1978, Electroencephalography and clinical neurophysiology.

[32]  R. Burkard Human Auditory Evoked Potentials , 2010 .

[33]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[34]  Steven A. Hillyard,et al.  Human Auditory Attention: A Central or Peripheral Process? , 1971, Science.

[35]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[36]  Gal Chechik,et al.  Reduction of Information Redundancy in the Ascending Auditory Pathway , 2006, Neuron.

[37]  R Kakigi,et al.  [Event-related brain potentials]. , 1997, Nihon rinsho. Japanese journal of clinical medicine.

[38]  M. Studdert-Kennedy,et al.  Speech perception deficits in poor readers: auditory processing or phonological coding? , 1997, Journal of experimental child psychology.

[39]  Christopher J. Smalt,et al.  Relationship between brainstem, cortical and behavioral measures relevant to pitch salience in humans , 2012, Neuropsychologia.

[40]  Erika Skoe,et al.  Perception of Speech in Noise: Neural Correlates , 2011, Journal of Cognitive Neuroscience.

[41]  N. Kraus,et al.  Subcortical differentiation of stop consonants relates to reading and speech-in-noise perception , 2009, Proceedings of the National Academy of Sciences.

[42]  N. Kraus,et al.  Musical Experience Limits the Degradative Effects of Background Noise on the Neural Processing of Sound , 2009, The Journal of Neuroscience.

[43]  E. Abberton Review of Marilyn May Vihman (1996) "Phonological Development: the Origins of Language in the Child", Blackwell, Oxford and Cambridge MA. , 1998 .

[44]  J. Kane,et al.  Brainstem Frequency-Following Responses and Cortical Event-Related Potentials during Attention , 1993, Perceptual and motor skills.

[45]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[46]  Christopher J. Plack,et al.  Subcortical Plasticity Following Perceptual Learning in a Pitch Discrimination Task , 2011, Journal of the Association for Research in Otolaryngology.

[47]  Erika Skoe,et al.  Neural Processing of Speech Sounds in ASD and First-Degree Relatives , 2010, Journal of Autism and Developmental Disorders.

[48]  C Pantev,et al.  Magnetic and electric brain activity evoked by the processing of tone and vowel stimuli , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[49]  N. Kraus,et al.  Subcortical encoding of sound is enhanced in bilinguals and relates to executive function advantages , 2012, Proceedings of the National Academy of Sciences.

[50]  M Steinschneider,et al.  Temporal encoding of the voice onset time phonetic parameter by field potentials recorded directly from human auditory cortex. , 1999, Journal of neurophysiology.

[51]  I R L Davies,et al.  Lateralization of categorical perception of color changes with color term acquisition , 2008, Proceedings of the National Academy of Sciences.

[52]  B. Auditory and phonetic memory codes in the discrimination of consonants and vowels * , 2022 .

[53]  A. Krishnan,et al.  Effects of reverberation on brainstem representation of speech in musicians and non-musicians , 2010, Brain Research.

[54]  T. Picton,et al.  The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. , 1987, Psychophysiology.

[55]  J. Werker,et al.  Speech perception in severely disabled and average reading children. , 1987, Canadian journal of psychology.

[56]  M. Scherg,et al.  Intracerebral Sources of Human Auditory-Evoked Potentials , 1999, Audiology and Neurotology.

[57]  P. D. Eimas,et al.  Speech Perception in Infants , 1971, Science.

[58]  Terence W. Picton,et al.  Envelope and spectral frequency-following responses to vowel sounds , 2008, Hearing Research.

[59]  Lee M. Miller,et al.  Methods to Eliminate Stimulus Transduction Artifact From Insert Earphones During Electroencephalography , 2012, Ear and hearing.

[60]  D. Poeppel,et al.  Processing of vowels in supratemporal auditory cortex , 1997, Neuroscience Letters.

[61]  E. Chang,et al.  Categorical Speech Representation in Human Superior Temporal Gyrus , 2010, Nature Neuroscience.

[62]  M. Scherg,et al.  A Source Analysis of the Late Human Auditory Evoked Potentials , 1989, Journal of Cognitive Neuroscience.

[63]  A M Liberman,et al.  A specialization for speech perception. , 1989, Science.

[64]  Colin Phillips,et al.  Levels of representation in the electrophysiology of speech perception , 2001, Cogn. Sci..

[65]  D. Poeppel,et al.  Dorsal and ventral streams: a framework for understanding aspects of the functional anatomy of language , 2004, Cognition.

[66]  Gavin M. Bidelman,et al.  Neural Correlates of Consonance, Dissonance, and the Hierarchy of Musical Pitch in the Human Brainstem , 2009, The Journal of Neuroscience.

[67]  Brian N. Pasley,et al.  Reconstructing Speech from Human Auditory Cortex , 2012, PLoS biology.

[68]  J. Rauschecker Parallel Processing in the Auditory Cortex of Primates , 1998, Audiology and Neurotology.

[69]  J. G. May Acoustic Factors that May Contribute to Categorical Perception , 1981 .

[70]  N. Kraus,et al.  Learning to Encode Timing: Mechanisms of Plasticity in the Auditory Brainstem , 2009, Neuron.

[71]  A. Krishnan,et al.  Musicians and tone-language speakers share enhanced brainstem encoding but not perceptual benefits for musical pitch , 2011, Brain and Cognition.

[72]  F. Keil,et al.  Categorical effects in the perception of faces , 1995, Cognition.

[73]  Christo Pantev,et al.  Sound Processing Hierarchy within Human Auditory Cortex , 2011, Journal of Cognitive Neuroscience.

[74]  T. Picton,et al.  Evoked potential audiometry. , 1976, The Journal of otolaryngology.

[75]  Katrina Agung,et al.  The use of cortical auditory evoked potentials to evaluate neural encoding of speech sounds in adults. , 2006, Journal of the American Academy of Audiology.

[76]  S. Peters,et al.  Neural Correlates of Categorical Perception in Learned Vocal Communication , 2009, Nature Neuroscience.

[77]  Gavin M. Bidelman,et al.  Neural representation of pitch salience in the human brainstem revealed by psychophysical and electrophysiological indices , 2010, Hearing Research.

[78]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[79]  E. Formisano,et al.  Learning of New Sound Categories Shapes Neural Response Patterns in Human Auditory Cortex , 2012, The Journal of Neuroscience.

[80]  S. Harnad Categorical Perception: The Groundwork of Cognition , 1990 .

[81]  P Iverson,et al.  Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. , 1995, The Journal of the Acoustical Society of America.

[82]  S. Hillyard,et al.  Electrical Signs of Selective Attention in the Human Brain , 1973, Science.

[83]  T W Picton,et al.  Human auditory evoked potentials. II. Effects of attention. , 1974, Electroencephalography and clinical neurophysiology.

[84]  J. T. Marsh,et al.  Far-field recorded frequency-following responses: evidence for the locus of brainstem sources. , 1975, Electroencephalography and clinical neurophysiology.

[85]  M. Dorman,et al.  Cortical auditory evoked potential correlates of categorical perception of voice-onset time. , 1999, The Journal of the Acoustical Society of America.

[86]  M. Scherg,et al.  Event‐Related Potentials and the Categorical Perception of Speech Sounds , 1995, Ear and hearing.

[87]  D. Bradley,et al.  Neural population code for fine perceptual decisions in area MT , 2005, Nature Neuroscience.

[88]  H Pratt,et al.  Sources of frequency following responses (FFR) in man. , 1977, Electroencephalography and clinical neurophysiology.

[89]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[90]  Terence W. Picton,et al.  Envelope Following Responses to Natural Vowels , 2006, Audiology and Neurotology.

[91]  M. Kilgard,et al.  Different timescales for the neural coding of consonant and vowel sounds. , 2013, Cerebral cortex.

[92]  C Alain,et al.  Selectively attending to auditory objects. , 2000, Frontiers in bioscience : a journal and virtual library.

[93]  Marilyn M. Vihman,et al.  Phonological Development , 2014 .

[94]  N. Kraus,et al.  Music training for the development of auditory skills , 2010, Nature Reviews Neuroscience.

[95]  Robert J. Zatorre,et al.  A role for the right superior temporal sulcus in categorical perception of musical chords , 2011, Neuropsychologia.