Mixture Gaussian HMM-trajctory method using likelihood compensation

We propose a new speech recognition method (HMM-trajectory method) that generates a speech trajectory from HMMs by maximizing their likelihood while accounting for the relationship between the MFCCs and dynamic MFCCs. One major advantage of this method is that this relationship, ignored in conventional speech recognition, is directly used in the speech recognition phase. This paper improves the recognition performance of the HMM-trajectory method for dealing with mixture Gaussian distributions. While the HMM-trajectory method chooses the Gaussian distribution sequence of the HMM states by selecting the best Gaussian distribution in the state during Viterbi decoding and calculating HMM trajectory likelihood along with the sequence, the proposed method compensates for HMM trajectory likelihood using ordinary HMM likelihood. In speaker-independent speech recognition experiments, the proposed method reduced the error rate about 10% for the task compared with HMMs, proving its effectiveness for Gaussian mixture components.

[1]  Shigeru Katagiri,et al.  A theoretical analysis of speech recognition based on feature trajectory models , 2004, INTERSPEECH.

[2]  J. S. Bridle,et al.  An investigation of segmental hidden dynamic models of speech coarticulation for automatic speech recognition , 1998 .

[3]  E. McDermott,et al.  Recognition method with parametric trajectory generated from mixture distribution HMMs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Shigeru Katagiri,et al.  A RECOGNITION METHOD USING SYNTHESIS-BASED SCORING THAT INCORPORATES DIRECT RELATIONS BETWEEN STATIC AND DYNAMIC FEATURE VECTOR TIME SERIES , 2001 .

[5]  Shigeru Katagiri,et al.  A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  John S. Bridle,et al.  The HDM: a segmental hidden dynamic model of coarticulation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Li Deng,et al.  A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition , 1998, Speech Commun..

[9]  Li Deng,et al.  Initial evaluation of hidden dynamic models on conversational speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Mari Ostendorf,et al.  A Dynamical System Approach to Continuous Speech Recognition , 1991, HLT.

[11]  Mari Ostendorf,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..