A novel node splitting criterion in decision tree construction for semi-continuous HMMs

In [1], we described how to improve Semi-Continuous Density Hidden Markov Models (SC-HMMs) to be as fast as Continuous Density HMMs (CD-HMMs), whilst outperforming them on large vocabulary recognition tasks with context independent models. In this paper, we extend our work with SC-HMMs to context dependent modelling. We propose a novel node splitting criterion in an approach with phonetic decision trees. It is based on a distance measure between mixture gaussian probability density functions (pdfs) as used in the final tied state SC-HMMs, this in contrast with other criteria which are based on simplified pdfs to manage the algorithm complexity. Results on the ARPA Resource Management task show that the proposed criterion outperforms two of these criteria with simplified pdfs.

[1]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Dirk Van Compernolle,et al.  Reduced semi-continuous models for large vocabulary continuous speech recognition in Dutch , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Roland Kuhn,et al.  Improved decision trees for phonetic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Patrick Kenny,et al.  Optimal tying of HMM mixture densities using decision trees , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Michael Picheny,et al.  Robust methods for using context-dependent features and models in a continuous speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .