论文信息 - Fast and accurate acoustic modelling with semi-continuous HMMs

Fast and accurate acoustic modelling with semi-continuous HMMs

Abstract In this paper the design of accurate Semi-Continuous Density Hidden Markov Models (SC-HMMs) for acoustic modelling in large vocabulary continuous speech recognition is presented. Two methods are described to improve drastically the efficiency of the observation likelihood calculations for the SC-HMMs. First, reduced SC-HMMs are created, where each state does not share all the – gaussian – probability density functions ( pdfs ) but only those which are important for it. It is shown how the average number of gaussians per state can be reduced to 70 for a total set of 10 000 gaussians. Second, a novel scalar selection algorithm is presented reducing to 5% the number of gaussians which have to be calculated on the total set of 10 000, without any degradation in recognition performance. Furthermore, the concept of tied state context-dependent modelling with phonetic decision trees is adapted to SC-HMMs. In fact, a node splitting criterion appropriate for SC-HMMs is introduced: it is based on a distance measure between the mixtures of gaussian pdfs as involved in SC-HMM state modelling. This contrasts with other criteria from literature which are based on simplified pdfs to manage the algorithmic complexity. On the ARPA Resource Management task, a relative reduction in word error rate of 8% was achieved with the proposed criterion, comparing with two known criteria based on simplified pdfs .

Dirk Van Compernolle | Kris Demuynck | Jacques Duchateau | J. Duchateau | Kris Demuynck

[1] Dirk Van Compernolle,et al. Reduced semi-continuous models for large vocabulary continuous speech recognition in Dutch , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2] Dirk Van Compernolle,et al. A novel node splitting criterion in decision tree construction for semi-continuous HMMs , 1997, EUROSPEECH.

[3] Michael Picheny,et al. Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4] Koichi Shinoda,et al. High speed speech recognition using tree-structured probability density function , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5] Mark J. F. Gales,et al. Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6] Enrico Bocchieri,et al. Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Chin-Hui Lee,et al. Acoustic modeling for large vocabulary speech recognition , 1990 .

[8] R. Haeb-Umbach,et al. Application of clustering techniques to mixture density modelling for continuous-speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9] Alexander H. Waibel,et al. Speeding up the score computation of HMM speech regognizers with the bucket voronoi intersection algorithm , 1995, EUROSPEECH.

[10] Michael Picheny,et al. Robust methods for using context-dependent features and models in a continuous speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11] Katarina Bartkova,et al. Parameter tying for flexible speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12] Vassilios Digalakis,et al. Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..

[13] Mei-Yuh Hwang,et al. Predicting unseen triphones with senones , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[15] Douglas B. Paul. The Lincoln tied-mixture HMM continuous speech recognizer , 1990 .

[16] Peter Beyerlein,et al. Fast log-likelihood computation for mixture densities in a high-dimensional feature space , 1994, ICSLP.

[17] Jerome R. Bellegarda,et al. Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[18] Roland Kuhn,et al. Improved decision trees for phonetic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19] Patrick Kenny,et al. Optimal tying of HMM mixture densities using decision trees , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20] Xuedong Huang,et al. Semi-continuous hidden Markov models for speech signals , 1990 .