Gaussian mixture selection using context-independent HMM

We address a method to efficiently select Gaussian mixtures for fast acoustic likelihood computation. It makes use of context-independent models for selection and back-off of corresponding triphone models. Specifically, for the k-best phone models by the preliminary evaluation, triphone models of higher resolution are applied, and others are assigned likelihoods with the monophone models. This selection scheme assigns more reliable back-off likelihoods to the un-selected states than the conventional Gaussian selection based on a VQ codebook. It can also incorporate efficient Gaussian pruning at the preliminary evaluation, which offsets the increased size of the pre-selection model. Experimental results show that the proposed method achieves comparable performance as the standard Gaussian selection, and performs much better under aggressive pruning condition. Together with the phonetic tied-mixture modeling, acoustic matching cost is reduced to almost 14% with little loss of accuracy.

[1]  Tatsuya Kawahara,et al.  An efficient two-pass search algorithm using word trellis index , 1998, ICSLP.

[2]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Mark J. F. Gales,et al.  Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Kiyohiro Shikano,et al.  A new phonetic tied-mixture model for efficient decoding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Nobuaki Minematsu,et al.  Free software toolkit for Japanese large vocabulary continuous speech recognition , 2000, INTERSPEECH.

[6]  Mark J. F. Gales,et al.  State-based Gaussian selection in large vocabulary continuous speech recognition using HMMs , 1999, IEEE Trans. Speech Audio Process..