Speech Recognition with Primarily Temporal Cues

Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands increased; high speech recognition performance was obtained with only three bands of modulated noise. Thus, the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.

[1]  Robert M. Kennedy Sea surface dipole sound source dependence on wave‐breaking variables , 1992 .

[2]  D J Van Tasell,et al.  Speech waveform envelope cues for consonant recognition. , 1987, The Journal of the Acoustical Society of America.

[3]  S. Rosen,et al.  Prosodic and segmental aspects of speech perception with the House/3M single-channel implant. , 1989, Journal of speech and hearing research.

[4]  D. Eddington Speech discrimination in deaf subjects with cochlear implants. , 1979, The Journal of the Acoustical Society of America.

[5]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[6]  F. J. Hill,et al.  Speech recognition as a function of channel capacity in a discrete set of channels. , 1968, The Journal of the Acoustical Society of America.

[7]  R V Shannon,et al.  Detection of gaps in sinusoids and pulse trains by patients with cochlear implants. , 1989, The Journal of the Acoustical Society of America.

[8]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[9]  R. Shannon Multichannel electrical stimulation of the auditory nerve in man. I. Basic psychophysics , 1983, Hearing Research.

[10]  R L Freyman,et al.  Effect of consonant-vowel ratio modification on amplitude envelope cues for consonant recognition. , 1991, Journal of speech and hearing research.

[11]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[12]  D J Van Tasell,et al.  Temporal cues for consonant recognition: training, talker generalization, and use in evaluation of cochlear implants. , 1992, The Journal of the Acoustical Society of America.

[13]  M F Dorman,et al.  Acoustic cues for consonant identification by patients who use the Ineraid cochlear implant. , 1990, The Journal of the Acoustical Society of America.

[14]  Brian C. J. Moore,et al.  Voice pitch as an aid to lipreading , 1981, Nature.

[15]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[16]  I. Pollack,et al.  Effects of Differentiation, Integration, and Infinite Peak Clipping upon the Intelligibility of Speech , 1948 .

[17]  M Dorman,et al.  Consonant recognition as a function of the number of channels of stimulation by patients who use the Symbion cochlear implant. , 1989, Ear and hearing.

[18]  M F Dorman,et al.  The coding of vowel identity by patients who use the Ineraid cochlear implant. , 1992, The Journal of the Acoustical Society of America.

[19]  William M. Rabinowitz,et al.  Better speech recognition with cochlear implants , 1991, Nature.