In this study we implemented a speech recognizer based on the integrated view, proposed first by Deng (see IEEE Signal Processing Letters, vol.1, no.4, p.66-69, 1994), on the speech preprocessing and speech modeling problems in the recognizer design. The integrated model we developed generalizes the conventional, currently widely used delta-parameter technique, which has been confined strictly to the preprocessing domain only, in two significant ways. First, the new model contains state-dependent weighting functions responsible for transforming static speech features into the dynamic ones in a slowly time-varying manner. Second, novel maximum-likelihood and minimum-classification-error based learning algorithms are developed for the model that allows joint optimization of the state-dependent weighting functions and the remaining conventional HMM parameters. The experimental results obtained from a standard TIMIT phonetic classification task provide preliminary evidence for the effectiveness of our new, general approaches to the use of the dynamic characteristics of speech spectra.
[1]
Li Deng,et al.
Large vocabulary word recognition using context-dependent allophonic hidden Markov models☆
,
1990
.
[2]
Biing-Hwang Juang,et al.
A Minimum Error Rate Pattern Recognition Approach to Speech Recognition
,
1994,
Int. J. Pattern Recognit. Artif. Intell..
[3]
Andrej Ljolje,et al.
High accuracy phone recognition using context clustering and quasi-triphonic models
,
1994,
Comput. Speech Lang..
[4]
Li Deng.
Integrated optimization of dynamic feature parameters for hidden Markov modeling of speech
,
1994,
IEEE Signal Process. Lett..