Parallel auditory filtering by sustained and transient channels separates coarticulated vowels and consonants

A neural model of peripheral auditory processing is described and used to separate features of coarticulated vowels and consonants. After preprocessing of speech via a filterbank, the model splits into two parallel channels, a sustained channel and a transient channel. The sustained channel is sensitive to relatively stable parts of the speech waveform, notably synchronous properties of the vocalic portion of the stimulus. It extends the dynamic range of eighth nerve filters using coincidence detectors that combine operations of raising to a power, rectification, delay, multiplication, time averaging, and preemphasis. The transient channel is sensitive to critical features at the onsets and offsets of speech segments. It is built up from fast excitatory neurons that are modulated by slow inhibitory interneurons. These units are combined over high-frequency and low-frequency ranges using operations of rectification, normalization, multiplicative gating, and opponent processing. Detectors sensitive to frication and to onset or offset of stop consonants and vowels are described. Model properties are characterized by mathematical analysis and computer simulations. Neural analogs of model cells in the cochlear nucleus and inferior colliculus are noted, as are psychophysical data about perception of CV syllables that may be explained by the sustained-transient channel hypothesis. The proposed sustained and transient processing seems to be an auditory analog of the sustained and transient processing that is known to occur in vision.

[1]  A. Liberman,et al.  Acoustic Loci and Transitional Cues for Consonants , 1954 .

[2]  K. Stevens,et al.  On the Properties of Voiceless Fricative Consonants , 1961 .

[3]  F. Cooper,et al.  FORMANT TRANSITIONS AND LOCI AS ACOUSTIC CORRELATES OF PLACE OF ARTICULATION IN AMERICAN FRICATIVES , 1962 .

[4]  木村 充 A.Papoulis: The Fourier Integral and its Applications. McGraw-Hill, New York 1962, 306頁, 15×23cm, $12.00. , 1963 .

[5]  Alexander Joseph Book reviewDischarge patterns of single fibers in the cat's auditory nerve: Nelson Yuan-Sheng Kiang, with the assistance of Takeshi Watanabe, Eleanor C. Thomas and Louise F. Clark: Research Monograph no. 35. Cambridge, Mass., The M.I.T. Press, 1965 , 1967 .

[6]  S. Grossberg Neural pattern discrimination. , 1970, Journal of theoretical biology.

[7]  L. Vogten Pure-Tone Masking: A New Result from a New Method , 1974 .

[8]  N. Kiang,et al.  Single unit activity in the posteroventral cochlear nucleus of the cat , 1975, The Journal of comparative neurology.

[9]  J. Pierce,et al.  The cochlear compromise. , 1976, The Journal of the Acoustical Society of America.

[10]  R. Britt,et al.  Synaptic events and discharge patterns of cochlear nucleus cells. II. Frequency-modulated tones. , 1976, Journal of neurophysiology.

[11]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[12]  E. de Boer,et al.  On cochlear encoding: Potentialities and limitations of the reverse‐correlation technique , 1978 .

[13]  B. Repp Perceptual integration and differentiation of spectral cues for intervocalic stop consonants , 1978, Perception & psychophysics.

[14]  Roman Bek,et al.  Discourse on one way in which a quantum-mechanics language on the classical logical base can be built up , 1978, Kybernetika.

[15]  S. Grossberg Behavioral Contrast in Short Term Memory: Serial Binary Memory Models or Parallel Continuous Memory Models? , 1978 .

[16]  M. Sachs,et al.  Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[17]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[18]  A. Liberman,et al.  Some effects of later-occurring information on the perception of stop consonant and semivowel , 1979, Perception & psychophysics.

[19]  R. Daniloff,et al.  The Physiology of Speech and Hearing: An Introduction , 1980 .

[20]  J. L. Miller Contextual effects in the discrimination of stop consonant and semivowel , 1980, Perception & psychophysics.

[21]  S. Grossberg How does a brain build a cognitive code , 1980 .

[22]  J. Pickles An Introduction to the Physiology of Hearing , 1982 .

[23]  J. T. Hogan,et al.  Vowel identification: orthographic, perceptual, and acoustic aspects. , 1982, The Journal of the Acoustical Society of America.

[24]  Stephen Grossberg,et al.  A Theory of Human Memory: Self-Organization and Performance of Sensory-Motor Codes, Maps, and Plans , 1982 .

[25]  B. Repp Bidirectional contrast effects in the perception of VC-CV sequences , 1983, Perception & psychophysics.

[26]  D. Pisoni,et al.  Perception of the duration of rapid spectrum changes in speech and nonspeech signals , 1983, Perception & psychophysics.

[27]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[28]  B. Delgutte Speech coding in the auditory nerve: II. Processing schemes for vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[29]  B. Delgutte,et al.  Speech coding in the auditory nerve: V. Vowels in background noise. , 1984, The Journal of the Acoustical Society of America.

[30]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[31]  B. Delgutte,et al.  Speech coding in the auditory nerve: III. Voiceless fricative consonants. , 1984, The Journal of the Acoustical Society of America.

[32]  L. Jackson Digital filters and signal processing , 1985 .

[33]  Stephen Grossberg,et al.  CHAPTER 6 – The Adaptive Self-organization of Serial Order in Behavior: Speech, Language, and Motor Control* , 1986 .

[34]  G. K. Yates,et al.  Basilar membrane measurements and the travelling wave , 1986, Hearing Research.

[35]  W. S. Rhode,et al.  Encoding timing and intensity in the ventral cochlear nucleus of the cat. , 1986, Journal of neurophysiology.

[36]  S. Grossberg,et al.  Neural dynamics of word recognition and recall: attentional priming, learning, and resonance. , 1986 .

[37]  S. Grossberg The Adaptive Self-Organization of Serial Order in Behavior: Speech, Language, And Motor Control , 1987 .

[38]  Stephen Grossberg,et al.  Speech Perception and Production by a Self-Organizing Neural Network. , 1987 .

[39]  S Grossberg,et al.  Masking fields: a massively parallel neural architecture for learning, recognizing, and predicting multiple groupings of patterned data. , 1987, Applied optics.

[40]  Stephen Grossberg,et al.  Neural dynamics of speech and language coding: developmental programs, perceptual grouping, and competition for short-term memory. , 1986, Human neurobiology.

[41]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[42]  Philip Lieberman,et al.  Speech Physiology, Speech Perception, and Acoustic Phonetics , 1988 .

[43]  J. Pickles An Introduction to the Physiology of Hearing, Second Edition , 1988 .

[44]  S. Greenberg Representation of Speech in the Auditory Periphery , 1988 .

[45]  Oded Ghitza,et al.  Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment , 1988 .

[46]  C. Schreiner,et al.  Periodicity coding in the inferior colliculus of the cat. II. Topographical organization. , 1988, Journal of neurophysiology.

[47]  L. Carney,et al.  Temporal coding of resonances by low-frequency auditory nerve fibers: single-fiber responses and a population model. , 1988, Journal of neurophysiology.

[48]  O. Rioul,et al.  Wavelets and signal processing , 1991, IEEE Signal Processing Magazine.

[49]  Daniel Bullock,et al.  A neural network model of serial order recall from short-term memory , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[50]  V. Mann,et al.  Perceptual order and the effect of vocalic context on fricative perception , 1991, Perception & psychophysics.

[51]  S. Grossberg,et al.  Pattern Recognition by Self-Organizing Neural Networks , 1991 .

[52]  Stephen Grossberg,et al.  Working Memory Networks for Learning Temporal Order with Application to Three-Dimensional Visual Object Recognition , 1992, Neural Computation.

[53]  F. Hlawatsch,et al.  Linear and quadratic time-frequency signal representations , 1992, IEEE Signal Processing Magazine.

[54]  S. Grossberg,et al.  Normal and amnesic learning, recognition and memory by a neural model of cortico-hippocampal interactions , 1993, Trends in Neurosciences.

[55]  S Grossberg,et al.  A spectral network model of pitch perception. , 1995, The Journal of the Acoustical Society of America.

[56]  S. Grossberg,et al.  Neural dynamics of variable-rate speech categorization. , 1997, Journal of experimental psychology. Human perception and performance.

[57]  R. Parasuraman The attentive brain , 1998 .

[58]  S. Grossberg,et al.  Neural dynamics of perceptual order and context effects for variable-rate speech syllables , 1999, Perception & psychophysics.