Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment

A computational model, based upon the temporal discharge patterns of auditory-nerve fibers, is described and compared with the more traditional Fourier transform method of spectral analysis. The model produces a frequency-domain representation of the input signal based on the ensemble histogram of interspike intervals generated by a simulated array of auditory-nerve fibers. The nerve-fiber discharge mechanism is modeled as a multi-level crossing detector at the output of each cochlear filter. The model incorporates 85 cochlear filters, equally spaced on a log-frequency scale between 200 and 3200Hz. The level crossings are measured at positive threshold levels which are pseudo-randomly distributed. The resulting “Ensemble Interval Histogram” (EIH) spectrum has two principal properties: (1) fine spectral details, which are well preserved in the low-frequency region, are poorly delineated in the high-frequency portion of the spectrum, (2) the EIH representation withstands the addition of noise to a far higher degree than the traditional Fourier power spectrum. The capability of the EIH model to preserve relevant phonetic information in quiet and in noisy acoustic environments was measured quantitatively using the EIH as a front-end to a Dynamic Time Warping, speaker-dependent, isolated-word recognizer. The database consisted of a 39-word alpha-digits vocabulary spoken by two male and two female speakers, over a range of signal-to-noise ratios. In the noise-free case, the performance of the EIH-based system is comparable to a conventional Fourier-based front-end. In the presence of noise, however, the performance of the EIH-based system is superior. The recognition scores of the EIH-based front-end drop more slowly than those of the Fourier-based system with increases in noise level. As a consequence, the resulting EIH superiority increases as the signal-to-ratio decreases.

[1]  L. R. Rabiner,et al.  The effects of selected signal processing techniques on the performance of a filter-bank-based isolated word recognizer , 1983, The Bell System Technical Journal.

[2]  G. Fant,et al.  Two-formant Models, Pitch and Vowel Perception , 1975 .

[3]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[4]  M. Liberman Single-neuron labeling in the cat auditory nerve. , 1982, Science.

[5]  J. Allen,et al.  Cochlear modeling , 1985, IEEE ASSP Magazine.

[6]  B. Delgutte,et al.  Speech coding in the auditory nerve: V. Vowels in background noise. , 1984, The Journal of the Acoustical Society of America.

[7]  M. Sachs,et al.  Rate-place and temporal-place representations of vowels in the auditory nerve and anteroventral cochlear nucleus , 1988 .

[8]  O. Ghitza A measure of in-synchrony regions in the auditory nerve firing patterns as a basis for speech vocoding , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  M. Sachs,et al.  Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[10]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[11]  B. Delgutte,et al.  Speech coding in the auditory nerve: III. Voiceless fricative consonants. , 1984, The Journal of the Acoustical Society of America.

[12]  C D Geisler,et al.  Thresholds for primary auditory fibers using statistically defined criteria. , 1985, The Journal of the Acoustical Society of America.

[13]  M. Liberman,et al.  Auditory-nerve response from cats raised in a low-noise chamber. , 1978, The Journal of the Acoustical Society of America.

[14]  M. Sachs,et al.  Encoding of steady-state vowels in the auditory nerve: representation in terms of discharge rate. , 1979, The Journal of the Acoustical Society of America.

[15]  L H Carney,et al.  A temporal analysis of auditory-nerve fiber responses to spoken stop consonant-vowel syllables. , 1986, The Journal of the Acoustical Society of America.

[16]  E D Young,et al.  Auditory nerve representation of vowels in background noise. , 1983, Journal of neurophysiology.