Use of generalized dynamic feature parameters for speech recognition: maximum likelihood and minimum classification error approaches

In this study we implemented a speech recognizer based on the integrated view, proposed first by Deng (see IEEE Signal Processing Letters, vol.1, no.4, p.66-69, 1994), on the speech preprocessing and speech modeling problems in the recognizer design. The integrated model we developed generalizes the conventional, currently widely used delta-parameter technique, which has been confined strictly to the preprocessing domain only, in two significant ways. First, the new model contains state-dependent weighting functions responsible for transforming static speech features into the dynamic ones in a slowly time-varying manner. Second, novel maximum-likelihood and minimum-classification-error based learning algorithms are developed for the model that allows joint optimization of the state-dependent weighting functions and the remaining conventional HMM parameters. The experimental results obtained from a standard TIMIT phonetic classification task provide preliminary evidence for the effectiveness of our new, general approaches to the use of the dynamic characteristics of speech spectra.