Neural dynamics of perceptual order and context effects for variable-rate speech syllables

How does the brain extract invariant properties of variable-rate speech? A neural model, called PHONET, is developed to explain aspects of this process and, along the way, data about perceptual context effects. For example, in consonant-vowel (CV) syllables, such as /ba/ and /wa/, an increase in the duration of the vowel can cause a switch in the percept of the preceding consonant from /w/ to Ibl (J. L. Miller & Liberman, 1979). The frequency extent of the initial formant transitions of fixed duration also influences the percept (Schwab, Sawusch, & Nusbaum, 1981). PHONET quantitatively simulates over 98% of the variance in these data, using a single set of parameters. The model also qualitatively explains many data about other perceptual context effects. In the model, C and V inputs are filtered by parallel auditory streams that respond preferentially to the transient and sustained properties of the acoustic signal before being stored in parallel working memories. A lateral inhibitory network of onset-and rate-sensitive cells in the transient channel extracts measures of frequency transition rate and extent. Greater activation of the transient stream can increase the processing rate in the sustained stream via a cross-stream automatic gain control interaction. The stored activities across these gain-controlled working memories provide a basis for rate-invariant perception, since the transient-to-sustained gain control tends to preserve therelative activities across the transient and sustained working memories as speech rate changes. Comparisons with alternative models tested suggest that the fit cannot be attributed to the simplicity of the data. Brain analogues of model cell types are described.

[1]  P. Denes Effect of Duration on the Perception of Voicing , 1955 .

[2]  A. Liberman,et al.  Tempo of frequency change as a cue for distinguishing classes of speech sounds. , 1956, Journal of experimental psychology.

[3]  H. Scheffé,et al.  The Analysis of Variance , 1960 .

[4]  A. Hodgkin The conduction of the nervous impulse , 1964 .

[5]  Calyampudi R. Rao,et al.  Linear statistical inference and its applications , 1965 .

[6]  W. C. Guenther,et al.  Analysis of variance , 1968, The Mathematical Gazette.

[7]  Neil A. Macmillan,et al.  Detection and recognition of increments and decrements in auditory intensity , 1971 .

[8]  Neil A. Macmillan,et al.  Detection and recognition of intensity changes in tone and noise: The detection-recognition disparity , 1973 .

[9]  S. Grossberg Contour Enhancement , Short Term Memory , and Constancies in Reverberating Neural Networks , 1973 .

[10]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[11]  R. Britt,et al.  Synaptic events and discharge patterns of cochlear nucleus cells. I. Steady-frequency tone bursts. , 1976, Journal of neurophysiology.

[12]  R. Britt,et al.  Synaptic events and discharge patterns of cochlear nucleus cells. II. Frequency-modulated tones. , 1976, Journal of neurophysiology.

[13]  B. Repp Perceptual integration and differentiation of spectral cues for intervocalic stop consonants , 1978, Perception & psychophysics.

[14]  M. Sachs,et al.  Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[15]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[16]  A. Liberman,et al.  Some effects of later-occurring information on the perception of stop consonant and semivowel , 1979, Perception & psychophysics.

[17]  D. Massaro,et al.  The contribution of vowel duration, F0 contour, and frication duration as cues to the/juz/-/jus/distinction , 1980, Perception & psychophysics.

[18]  Q Summerfield,et al.  Information in speech: observations on the perception of [s]-stop clusters. , 1980, Journal of experimental psychology. Human perception and performance.

[19]  V. Mann,et al.  Influence of vocalic context on perception of the [∫]-[s] distinction , 1978 .

[20]  S. Grossberg How does a brain build a cognitive code , 1980 .

[21]  Q. Summerfield Articulatory rate and perceptual constancy in phonetic perception. , 1981, Journal of experimental psychology. Human perception and performance.

[22]  E. C. Schwab,et al.  The role of second formant transitions in the stop-semivowel distinction , 1981, Perception & psychophysics.

[23]  Peter D. Eimas,et al.  Perspectives on the study of speech , 1981 .

[24]  R. Luce,et al.  Evidence from auditory simple reaction times for both change and level detectors , 1982, Perception & psychophysics.

[25]  J. Pickles An Introduction to the Physiology of Hearing , 1982 .

[26]  W Rhode,et al.  Auditory physiology. , 1982, Science.

[27]  J. T. Hogan,et al.  Vowel identification: orthographic, perceptual, and acoustic aspects. , 1982, The Journal of the Acoustical Society of America.

[28]  Stephen Grossberg,et al.  Studies of mind and brain , 1982 .

[29]  R. Port,et al.  Consonant/vowel ratio as a cue for voicing in English , 1982, Perception & psychophysics.

[30]  Stephen Grossberg,et al.  A Theory of Human Memory: Self-Organization and Performance of Sensory-Motor Codes, Maps, and Plans , 1982 .

[31]  P Howell,et al.  Production and perception of rise time in the voiceless affricate/fricative distinction. , 1983, The Journal of the Acoustical Society of America.

[32]  D. Pisoni,et al.  Perception of the duration of rapid spectrum changes in speech and nonspeech signals , 1983, Perception & psychophysics.

[33]  B. Delgutte Speech coding in the auditory nerve: II. Processing schemes for vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[34]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[35]  S E Blumstein,et al.  On the role of the amplitude envelope for the perception of [b] and [w]. , 1984, The Journal of the Acoustical Society of America.

[36]  Shihab A. Shamma,et al.  Patterns of inhibition in auditory cortical cells in awake squirrel monkeys , 1985, Hearing Research.

[37]  K M Berg,et al.  Temporal masking level differences for transients: Further evidence for a short-term integrator , 1985, Perception & psychophysics.

[38]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[39]  S. Blumstein,et al.  Limitations of context conditioned effects in the perception of [b] and [w] , 1985, Perception & psychophysics.

[40]  Stephen Grossberg,et al.  CHAPTER 6 – The Adaptive Self-organization of Serial Order in Behavior: Speech, Language, and Motor Control* , 1986 .

[41]  Eileen C. Schwab,et al.  Pattern recognition by humans and machines , 1986 .

[42]  W. S. Rhode,et al.  Physiological studies on neurons in the dorsal cochlear nucleus of cat. , 1986, Journal of neurophysiology.

[43]  S. Grossberg,et al.  Neural dynamics of attention switching and temporal-order information in short-term memory , 1988, Memory & cognition.

[44]  M Studdert-Kennedy,et al.  The stop-glide distinction: acoustic analysis and perceptual effect of variation in syllable amplitude envelope for initial /b/ and /w/. , 1986, The Journal of the Acoustical Society of America.

[45]  S. Grossberg The Adaptive Self-Organization of Serial Order in Behavior: Speech, Language, And Motor Control , 1987 .

[46]  Stephen Grossberg,et al.  Speech Perception and Production by a Self-Organizing Neural Network. , 1987 .

[47]  S Grossberg,et al.  Masking fields: a massively parallel neural architecture for learning, recognizing, and predicting multiple groupings of patterned data. , 1987, Applied optics.

[48]  Stephen Grossberg,et al.  Neural dynamics of word recognition and recall: attentional priming, learning, and resonance. , 1986, Psychology Review.

[49]  Stephen Grossberg,et al.  Neural dynamics of speech and language coding: developmental programs, perceptual grouping, and competition for short-term memory. , 1986, Human neurobiology.

[50]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[51]  D. C. Van Essen,et al.  Concurrent processing streams in monkey visual cortex , 1988, Trends in Neurosciences.

[52]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[53]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[54]  R. Diehl,et al.  An auditory basis for the stimulus-length effect in the perception of stops and glides. , 1989, The Journal of the Acoustical Society of America.

[55]  J. L. Miller,et al.  Effect of speaking rate on the perceptual structure of a phonetic category , 1989, Perception & psychophysics.

[56]  A quantitative description of membrane current and its application to conduction and excitation in nerve. 1952. , 1990, Bulletin of mathematical biology.

[57]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1990 .

[58]  S. Harnad Categorical Perception: The Groundwork of Cognition , 1990 .

[59]  R. Schmidt,et al.  Progress in Sensory Physiology , 1991, Progress in Sensory Physiology.

[60]  Bahram Nabet,et al.  Sensory neural networks - lateral inhibiton , 1991 .

[61]  V. Mann,et al.  Perceptual order and the effect of vocalic context on fricative perception , 1991, Perception & psychophysics.

[62]  S. Grossberg,et al.  Pattern Recognition by Self-Organizing Neural Networks , 1991 .

[63]  R L Diehl,et al.  Formant Transition Duration and Amplitude Rise Time as Cues to the Stop/Glide Distinction , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[64]  J. L. Miller,et al.  Phonetic prototypes: influence of place of articulation and speaking rate on the internal structure of voicing categories. , 1992, The Journal of the Acoustical Society of America.

[65]  Stephen Grossberg,et al.  Working Memory Networks for Learning Temporal Order with Application to Three-Dimensional Visual Object Recognition , 1992, Neural Computation.

[66]  R. Desimone,et al.  Activity of neurons in anterior inferior temporal cortex during a short- term memory task , 1993, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[67]  J L Miller,et al.  Limits on the limitations of context-conditioned effects in the perception of [b] and [w] , 1993, Perception & psychophysics.

[68]  D. P. Phillips Neural Representation of Stimulus Times in the Primary Auditory Cortex a , 1993, Annals of the New York Academy of Sciences.

[69]  D. Massaro,et al.  The paradigm and the fuzzy logical model of perception are alive and well. , 1993, Journal of experimental psychology. General.

[70]  S. Grossberg,et al.  Normal and amnesic learning, recognition and memory by a neural model of cortico-hippocampal interactions , 1993, Trends in Neurosciences.

[71]  J. H. Casseday,et al.  Neural tuning for sound duration: role of inhibitory mechanisms in the inferior colliculus. , 1994, Science.

[72]  F. de Ribaupierre,et al.  Changes of single unit activity in the cat's auditory thalamus and cortex associated to different anesthetic conditions , 1994, Neuroscience Research.

[73]  J P Rauschecker,et al.  Processing of frequency-modulated sounds in the cat's anterior auditory field. , 1994, Journal of neurophysiology.

[74]  S. Shamma,et al.  Ripple Analysis in Ferret Primary Auditory Cortex. I. Response Characteristics of Single Units to Sinusoidally Rippled Spectra , 1994 .

[75]  A van Wieringen,et al.  Frequency and duration discrimination of short first-formant speechlike transitions. , 1994, The Journal of the Acoustical Society of America.

[76]  Gerald Sommer,et al.  Pattern Recognition by Self-Organizing Neural Networks , 1994 .

[77]  J L Miller,et al.  The influence of sentential speaking rate on the internal structure of phonetic categories. , 1994, The Journal of the Acoustical Society of America.

[78]  Shihab A. Shamma,et al.  Ripple Analysis in Ferret Primary Auditory Cortex. 3. Prediction of Unit Responses to Arbitrary Spectral Profiles , 1995 .

[79]  L. Pols,et al.  Discrimination of single and complex consonant–vowel‐ and vowel–consonant‐like formant transitions , 1995 .

[80]  S. Grossberg The Attentive Brain , 1995 .

[81]  P. Jusczyk,et al.  The cocktail party effect in infants , 1996 .

[82]  J. Sawusch,et al.  Perceptual normalization for speaking rate: Effects of temporal distance , 1996, Perception & psychophysics.

[83]  J. Ostwald,et al.  Responses to exponential frequency modulations in the rat inferior colliculus , 1996, Hearing Research.

[84]  P. Goldman-Rakic Regional and cellular fractionation of working memory. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[85]  S. Grossberg,et al.  Neural dynamics of variable-rate speech categorization. , 1997, Journal of experimental psychology. Human perception and performance.

[86]  Stephen Grossberg,et al.  Parallel auditory filtering by sustained and transient channels separates coarticulated vowels and consonants , 1997, IEEE Trans. Speech Audio Process..

[87]  S C Rao,et al.  Integration of what and where in the primate prefrontal cortex. , 1997, Science.

[88]  P. Goldman-Rakic,et al.  Differential Activation of the Caudate Nucleus in Primates Performing Spatial and Nonspatial Working Memory Tasks , 1997, The Journal of Neuroscience.

[89]  J P Rauschecker,et al.  Processing of frequency-modulated sounds in the cat's posterior auditory field. , 1994, Journal of neurophysiology.

[90]  R. Parasuraman The attentive brain , 1998 .

[91]  P. Todd,et al.  Musical networks: Parallel distributed perception and performance , 1999 .

[92]  Stephen Grossberg,et al.  Pitch-based streaming in auditory perception , 1999 .