Reformulating the HMM as a Trajectory Model

We have shown that the HMM whose state output vector includes static and dynamic feature parameters can be reformulated as a trajectory model by imposing the explicit relationship between the static and dynamic features. The derived model, referred to as “trajectory HMM,” can alleviate the limitations of HMMs: i) constant statistics within an HMM state and ii) independence assumption of state output probabilities. In this paper, we first summarize the definition and the training algorithm. Then, to show that the trajectory HMM is a proper generative model, we derive a new algorithm for sampling from the trajectory model, and show the result of an illustrative experiment. A speech recognition experiment demonstrates the consistency between training and decoding criteria is essential: the model should not only be traind as a trajectory model but also be used as a trajectory model in decoding, even though the trajectory model has the same parameterization as the standard HMM.

[1]  Heiga Zen,et al.  An introduction of trajectory model into HMM-based speech synthesis , 2004, SSW.

[2]  C. J. Wellekens,et al.  Explicit correlation in hidden Markov model for speech recognition , 1987 .

[3]  Keiichi Tokuda,et al.  Eigenvoices for HMM-based speech synthesis , 2002, INTERSPEECH.

[4]  Satoshi Takahashi,et al.  Phoneme HMMs constrained by frame correlations , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[6]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[7]  Christopher K. I. Williams How to Pretend That Correlated Variables Are Independent by Using Difference Observations , 2005, Neural Computation.

[8]  Alex Acero,et al.  Formant analysis and synthesis using hidden Markov models , 1999, EUROSPEECH.

[9]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10]  Keiichi Tokuda,et al.  An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features , 1995, EUROSPEECH.

[11]  K. Koishida,et al.  Vector quantization of speech spectral parameters using statistics of dynamic features , 1997 .

[12]  Mari Ostendorf,et al.  A stochastic segment model for phoneme-based continuous speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[14]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[15]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  John Scott Bridle,et al.  Towards better understanding of the model implied by the use of dynamic features in HMMs , 2004, INTERSPEECH.

[17]  E. McDermott,et al.  Recognition method with parametric trajectory generated from mixture distribution HMMs , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18]  Keiichi Tokuda,et al.  Speaker interpolation in HMM-based speech synthesis system , 1997, EUROSPEECH.

[19]  Mark J. F. Gales,et al.  Basis superposition precision matrix modelling for large vocabulary continuous speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Heiga Zen,et al.  A Viterbi algorithm for a trajectory model derived from HMM with explicit relationship between static and dynamic features , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[22]  Martin J. Russell,et al.  Speech recognition using a linear dynamic segmental HMM , 1995, EUROSPEECH.

[23]  Kuldip K. Paliwal,et al.  Use of temporal correlation between successive frames in a hidden Markov model based speech recognizer , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Yifan Gong,et al.  Stochastic trajectory modeling for speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[26]  Mark J. F. Gales,et al.  Product of Gaussians for speech recognition , 2006, Comput. Speech Lang..

[27]  Shigeru Katagiri,et al.  A recognition method with parametric trajectory synthesized using direct relations between static and dynamic feature vector time series , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[29]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[30]  Keiichi Tokuda,et al.  Speaker adaptation for HMM-based speech synthesis system using MLLR , 1998, SSW.

[31]  Mark J. F. Gales,et al.  Segmental hidden Markov models , 1993, EUROSPEECH.