Trajectory Clustering of Syllable-Length Acoustic Models for Continuous Speech Recognition

Recent research suggests that modeling coarticulation in speech is more appropriate at the syllable level. However, due to a number of additional factors that affect the way syllables are articulated, creating multiple paths through syllable models might be necessary. Our previous research on longer-length multi-path models in connected digit recognition has proved trajectory clustering to be an attractive approach to deriving multi-path models. In this paper, we extend our research to large vocabulary continuous speech recognition by deriving trajectory clusters for 94 very frequent syllables in a 20-hour data set of Dutch read speech. The resulting clusters are compared with a knowledge-based classification. The comparison results suggest that multi-path models for syllables are difficult to build based on phonetic and linguistic knowledge. When multi-path models based on trajectory clustering are used, speech recognition performance improves significantly. Thus, it is concluded that data-driven trajectory clustering is a very effective approach to developing multi-path models

[1]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[2]  Biing-Hwang Juang,et al.  Minimum error rate training of inter-word context dependent acoustic model units in speech recognition , 1994, ICSLP.

[3]  Lou Boves,et al.  Experiences from the Spoken Dutch Corpus Project , 2002, LREC.

[4]  Lou Boves,et al.  Modeling lexical stress in continuous speech recognition for Dutch , 2003, Speech Commun..

[5]  Shrikanth S. Narayanan,et al.  Split-lexicon based hierarchical recognition of speech using syllable and word level acoustic units , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Lou Boves,et al.  Using lexical stress in continuous speech recognition for Dutch , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Lou Boves,et al.  Longer-length acoustic units for continuous speech recognition , 2005, 2005 13th European Signal Processing Conference.

[8]  Bhuvana Ramabhadran,et al.  Improvements in English ASR for the MALACH project using syllable-centric models , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[9]  Yan Han,et al.  Speech trajectory clustering for improved speech recognition , 2005, INTERSPEECH.

[10]  Denis Jouvet,et al.  Context dependent "long units" for speech recognition , 2004, INTERSPEECH.

[11]  Rhys James Jones,et al.  Continuous speech recognition using syllables , 1997, EUROSPEECH.

[12]  Louis Boves,et al.  Syllable-Length Acoustic Units in Large-Vocabulary Continuous Speech Recognition , 2005 .

[13]  Yan Han,et al.  Trajectory Clustering for Solving the Trajectory Folding Problem in Automatic Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Joseph Picone,et al.  Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[15]  Yan Han,et al.  Trajectory clustering for automatic speech recognition , 2005, 2005 13th European Signal Processing Conference.