论文信息 - Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS

Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS

This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMM-based systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recognition accuracy tends to degrade above a /spl times/3 reduction in likelihood computation. To explain this degradation, this paper investigates the trade-offs necessary between achieving good state likelihoods and low computation. In addition, the problem of unseen states in a cluster is examined. It is shown that further improvements are possible. For example, using a different assignment measure, with a constraint on the number of components per state per cluster enabled the recognition accuracy on a 5k speaker-independent task to be maintained up to a /spl times/5 reduction in likelihood computation.

Mark J. F. Gales | Kate Knill | Steve J. Young

[1] Philip A. Chou,et al. Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Vassilios Digalakis,et al. Techniques to Achieve an Accurate Real-Time Large-Vocabulary Speech Recognition System , 1994, HLT.

[3] Steve J. Young,et al. Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Xuedong Huang,et al. On semi-continuous hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[6] Steve J. Young,et al. A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[7] Hermann Ney,et al. Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition , 1997, EUROSPEECH.

[8] Philip C. Woodland,et al. The development of the 1994 HTK large vocabulary speech recognition system , 1995 .

[9] G. Poggi. Fast algorithm for full-search VQ encoding , 1993 .

[10] Enrico Bocchieri. A study of the beam-search algorithm for large vocabulary continuous speech recognition and methods for improved efficiency , 1993, EUROSPEECH.

[11] Ivica Rogina,et al. The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12] Kate Knill,et al. Fast implementation methods for Viterbi-based word-spotting , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13] C. Lefebvre,et al. A comparison of several acoustic representations for speech recognition with degraded and undegraded speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14] Peter Beyerlein,et al. Hamming distance approximation for a fast log-likelihood computation for mixture densities , 1995, EUROSPEECH.

[15] Enrico Bocchieri,et al. Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .