Pitch and spectral estimation of speech based on auditory synchrony model

This paper describes a system for processing sonorant regions of speech, motivated by knowledge of the human auditory system. The spectral representation is intended to reflect a proposed model for human auditory processing of speech, which takes advantage of synchrony in the nerve firing patterns to enhance formant peaks. The auditory model is also applied to pitch extraction, and thus a temporal pitch processor is envisioned. The spectrum is derived from the outputs of a set of linear fillers with critical bandwidths. Saturation and adaptation are incorporated for each filter independently. Each "spectral" coefficient is determined by weighting the amplitude response at that frequency by a measure of synchrony to the center frequency of the filter. Pitch is derived front a waveform generated by adding the rectified filter outputs across the frequency dimension. The spectral estimator and the pitch estimator are illustrated by processing pure tones and natural speech.