Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS

This paper investigates the use of Gaussian Selection (GS) to reduce the state likelihood computation in HMM-based systems. These likelihood calculations contribute significantly (30 to 70%) to the computational load. Previously, it has been reported that when GS is used on large systems the recognition accuracy tends to degrade above a /spl times/3 reduction in likelihood computation. To explain this degradation, this paper investigates the trade-offs necessary between achieving good state likelihoods and low computation. In addition, the problem of unseen states in a cluster is examined. It is shown that further improvements are possible. For example, using a different assignment measure, with a constraint on the number of components per state per cluster enabled the recognition accuracy on a 5k speaker-independent task to be maintained up to a /spl times/5 reduction in likelihood computation.

[1]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Vassilios Digalakis,et al.  Techniques to Achieve an Accurate Real-Time Large-Vocabulary Speech Recognition System , 1994, HLT.

[3]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Xuedong Huang,et al.  On semi-continuous hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[6]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[7]  Hermann Ney,et al.  Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition , 1997, EUROSPEECH.

[8]  Philip C. Woodland,et al.  The development of the 1994 HTK large vocabulary speech recognition system , 1995 .

[9]  G. Poggi Fast algorithm for full-search VQ encoding , 1993 .

[10]  Enrico Bocchieri A study of the beam-search algorithm for large vocabulary continuous speech recognition and methods for improved efficiency , 1993, EUROSPEECH.

[11]  Ivica Rogina,et al.  The bucket box intersection (BBI) algorithm for fast approximative evaluation of diagonal mixture Gaussians , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[12]  Kate Knill,et al.  Fast implementation methods for Viterbi-based word-spotting , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  C. Lefebvre,et al.  A comparison of several acoustic representations for speech recognition with degraded and undegraded speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  Peter Beyerlein,et al.  Hamming distance approximation for a fast log-likelihood computation for mixture densities , 1995, EUROSPEECH.

[15]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .