A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features

This paper introduces a Viterbi algorithm to obtain a sub-optimal state sequence for trajectory-HMM, which is derived from HMM with explicit relationship between static and dynamic features. The trajectory-HMM can alleviate some limitations of HMM, which are (i) constant statistics within HMM state and (ii) conditional independence of observations given the state sequence, without increasing the number of model parameters. The proposed algorithm was applied to state-boundary optimization for Viterbi training and N-best rescoring. In a speaker-dependent continuous speech recognition experiment, trajectory-HMM with the proposed algorithm achieved about 14% error reduction over the standard HMM with the conventional Viterbi algorithm.

[1]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Heiga Zen,et al.  Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features , 2003, INTERSPEECH.

[4]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[5]  Martin J. Russell,et al.  Speech recognition using a linear dynamic segmental HMM , 1995, EUROSPEECH.

[6]  Mark J. F. Gales,et al.  Segmental hidden Markov models , 1993, EUROSPEECH.

[7]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Kuldip K. Paliwal,et al.  Use of temporal correlation between successive frames in a hidden Markov model based speech recognizer , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Yifan Gong,et al.  Stochastic trajectory modeling for speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[11]  C. J. Wellekens,et al.  Explicit time correlation in hidden Markov models for speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  C. J. Wellekens,et al.  Explicit correlation in hidden Markov model for speech recognition , 1987 .

[13]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[14]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..