Recognition method with parametric trajectory generated from mixture distribution HMMs

We have proposed a new speech recognition technique that generates a speech trajectory from HMMs by maximizing the likelihood of the trajectory, while accounting for the relation between the cepstrum and the dynamic cepstrum coefficients. This method has the major advantage that the relation, which is ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper describes an extension of the method for dealing with HMMs whose distributions are mixture Gaussian distributions. The method chooses the sequence of Gaussian distributions by selecting the best Gaussian distribution in the state during Viterbi decoding. Speaker-independent speech recognition experiments were carried out. The proposed method obtained an 18.2% reduction in error rate for the task, proving that the proposed method is effective even for Gaussian mixture HMMs.

[1]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Shigeru Katagiri,et al.  A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  S. Rocous,et al.  Stochastic segment modeling using the estimate-maximize algorithm , 1988 .

[5]  Herbert Gish,et al.  A segmental speech model with applications to word spotting , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Herbert Gish,et al.  Hidden Markov models for trajectory modeling , 1998, ICSLP.

[7]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[8]  Martin J. Russell,et al.  Probabilistic-trajectory segmental HMMs , 1999, Comput. Speech Lang..

[9]  Mari Ostendorf,et al.  From HMMS to Segment Models: Stochastic Modeling for CSR , 1996 .