Syllable-length path mixture hidden Markov models with trajectory clustering for continuous speech recognition

Recent research suggests that modeling coarticulation in speech is more appropriate at the syllable level. However, due to a number of additional factors that can affect the way syllables are articulated, creating multiple acoustic models per syllable might be necessary. Our previous research on longer-length multi-path models has proved that data-driven trajectory clustering to be an attractive approach to derive multi-path models. However, the use of single distribution with unvarying covariance to model a trajectory cluster may degrade its capability of detecting pronunciation variants. In this paper, we propose a new method, namely path mixture hidden Markov model, to alleviate the adverse effects of trajectory clustering. The improvement on performance observed in continuous speech recognition experiments show path mixture model is a very effective approach.

[1]  Yan Han,et al.  Trajectory Clustering of Syllable-Length Acoustic Models for Continuous Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Lou Boves,et al.  Longer-length acoustic units for continuous speech recognition , 2005, 2005 13th European Signal Processing Conference.

[3]  Denis Jouvet,et al.  Context dependent "long units" for speech recognition , 2004, INTERSPEECH.

[4]  Shrikanth S. Narayanan,et al.  Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Rhys James Jones,et al.  Continuous speech recognition using syllables , 1997, EUROSPEECH.

[6]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[7]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[8]  Louis Boves,et al.  Syllable-Length Acoustic Units in Large-Vocabulary Continuous Speech Recognition , 2005 .

[9]  Joseph Picone,et al.  Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[10]  Lou Boves,et al.  Experiences from the Spoken Dutch Corpus Project , 2002, LREC.