Joint MCE estimation of VQ and HMM parameters for Gaussian mixture selection

Vector quantization (VQ) has been explored in the past as a means of reducing likelihood computation in speech recognizers which use hidden Markov models (HMMs) containing Gaussian output densities. Although this approach has proved successful, there is an extent beyond which further reduction in likelihood computation substantially degrades the recognition accuracy. Since the components of the VQ frontend are typically designed after model training is complete, this degradation can be attributed to the fact that VQ and HMM parameters are not jointly estimated. In order to restore the accuracy of a recognizer using VQ to aggressively reduce computation, joint estimation is necessary. We propose a technique which couples VQ frontend design with minimum classification error training. We demonstrate on a large vocabulary subword task that in certain cases, our joint training algorithm can reduce the string error rate by 79% compared to that of VQ mixture selection alone.

[1]  R. A. Sukkar,et al.  Variable threshold vector quantization for reduced continuous density likelihood computation in speech recognition , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  Mark J. F. Gales,et al.  Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Biing-Hwang Juang,et al.  Minimum error rate training based on N-best string models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.