Fast and accurate acoustic modelling with semi-continuous HMMs

Abstract In this paper the design of accurate Semi-Continuous Density Hidden Markov Models (SC-HMMs) for acoustic modelling in large vocabulary continuous speech recognition is presented. Two methods are described to improve drastically the efficiency of the observation likelihood calculations for the SC-HMMs. First, reduced SC-HMMs are created, where each state does not share all the – gaussian – probability density functions ( pdfs ) but only those which are important for it. It is shown how the average number of gaussians per state can be reduced to 70 for a total set of 10 000 gaussians. Second, a novel scalar selection algorithm is presented reducing to 5% the number of gaussians which have to be calculated on the total set of 10 000, without any degradation in recognition performance. Furthermore, the concept of tied state context-dependent modelling with phonetic decision trees is adapted to SC-HMMs. In fact, a node splitting criterion appropriate for SC-HMMs is introduced: it is based on a distance measure between the mixtures of gaussian pdfs as involved in SC-HMM state modelling. This contrasts with other criteria from literature which are based on simplified pdfs to manage the algorithmic complexity. On the ARPA Resource Management task, a relative reduction in word error rate of 8% was achieved with the proposed criterion, comparing with two known criteria based on simplified pdfs .

[1]  Dirk Van Compernolle,et al.  Reduced semi-continuous models for large vocabulary continuous speech recognition in Dutch , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Dirk Van Compernolle,et al.  A novel node splitting criterion in decision tree construction for semi-continuous HMMs , 1997, EUROSPEECH.

[3]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Koichi Shinoda,et al.  High speed speech recognition using tree-structured probability density function , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Mark J. F. Gales,et al.  Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[8]  R. Haeb-Umbach,et al.  Application of clustering techniques to mixture density modelling for continuous-speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Alexander H. Waibel,et al.  Speeding up the score computation of HMM speech regognizers with the bucket voronoi intersection algorithm , 1995, EUROSPEECH.

[10]  Michael Picheny,et al.  Robust methods for using context-dependent features and models in a continuous speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Katarina Bartkova,et al.  Parameter tying for flexible speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Vassilios Digalakis,et al.  Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..

[13]  Mei-Yuh Hwang,et al.  Predicting unseen triphones with senones , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[15]  Douglas B. Paul The Lincoln tied-mixture HMM continuous speech recognizer , 1990 .

[16]  Peter Beyerlein,et al.  Fast log-likelihood computation for mixture densities in a high-dimensional feature space , 1994, ICSLP.

[17]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter models for large vocabulary isolated speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[18]  Roland Kuhn,et al.  Improved decision trees for phonetic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Patrick Kenny,et al.  Optimal tying of HMM mixture densities using decision trees , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .