This paper describes a system for processing sonorant regions of speech, motivated by knowledge of the human auditory system. The spectral representation is intended to reflect a proposed model for human auditory processing of speech, which takes advantage of synchrony in the nerve firing patterns to enhance formant peaks. The auditory model is also applied to pitch extraction, and thus a temporal pitch processor is envisioned. The spectrum is derived from the outputs of a set of linear fillers with critical bandwidths. Saturation and adaptation are incorporated for each filter independently. Each "spectral" coefficient is determined by weighting the amplitude response at that frequency by a measure of synchrony to the center frequency of the filter. Pitch is derived front a waveform generated by adding the rectified filter outputs across the frequency dimension. The spectral estimator and the pitch estimator are illustrated by processing pure tones and natural speech.
[1]
E. Zwicker,et al.
Subdivision of the audible frequency range into critical bands
,
1961
.
[2]
J. L. Goldstein,et al.
A central spectrum model: a synthesis of auditory-nerve timing and place cues in monaural communication of frequency spectrum.
,
1983,
The Journal of the Acoustical Society of America.
[3]
M. Sachs,et al.
Effects of nonlinearities on speech encoding in the auditory nerve.
,
1979,
The Journal of the Acoustical Society of America.
[4]
H S Colburn,et al.
Theory of binaural interaction based on auditory-nerve data. I. General strategy and preliminary results on interaural discrimination.
,
1973,
The Journal of the Acoustical Society of America.