We propose a new method of obtaining features from speech signals for robust analysis and recognition-the non-uniform linear prediction (NLP) cepstrum. The objective is to derive a representation that suppresses speaker-dependent characteristics while preserving the linguistic quality of speech segments. The analysis is based on two principles. First, Bark frequency warping is performed on the LP spectrum to emulate the auditory spectrum. While widely used methods such as the mel-frequency and PLP analysis use the FFT spectrum as its basis for warping, the NLP analysis uses the LP-based vocal-tract spectrum with glottal effects removed. Second, all-pole modeling (LP) is used before and after the warping. The pre-warp LP is used to first obtain the vocal-tract spectrum, while the post-warp LP is performed to obtain a smoothed, two-peak model of the warped spectrum. Experiments were conducted to test the effectiveness of the proposed feature in the case of identification/discrimination of vowels uttered by multiple speakers using linear discriminant analysis (LDA), and frame-based vowel recognition with a statistical model. In both cases, the NLP analysis was shown to be an effective tool for speaker-independent speech analysis/recognition applications.
[1]
H. Strube.
Linear prediction on a warped frequency scale
,
1980
.
[2]
Stan Davis,et al.
Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se
,
1980
.
[3]
G. Fant,et al.
Two-formant Models, Pitch and Vowel Perception
,
1975
.
[4]
H Hermansky,et al.
Perceptual linear predictive (PLP) analysis of speech.
,
1990,
The Journal of the Acoustical Society of America.
[5]
E. Zwicker,et al.
A MODEL OF LOUDNESS SUMMATION.
,
1965,
Psychological review.
[6]
Jonathan S. Abel,et al.
The Bark bilinear transform
,
1995,
Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.