Weighted distance measures for efficient reduction of Gaussian mixture components in HMM-based acoustic model

In this paper, two weighted distance measures; the weighted K-L divergence and the Bayesian criterion-based distance measure are proposed to efficiently reduce the Gaussian mixture components in the HMM-based acoustic model. Conventional distance measures such as the K-L divergence and the Bhattacharyya distance consider only distribution parameters (i.e. mean and variance vectors of Gaussian pdfs). Another example considers only mixture weights. In contrast to them, the two proposed distance measures consider both distribution parameters and mixture weights. Experimental results showed that the component-reduced acoustic models created using the proposed distance measures were more compact and computationally efficient than those created using conventional distance measures.

[1]  R. DeMori,et al.  Handbook of pattern recognition and image processing , 1986 .

[2]  Koichi Shinoda,et al.  Efficient reduction of Gaussian components using MDL criterion for HMM-based speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  V. Fischer,et al.  Reduced gaussian mixture models in a large vocabulary continuous speech recognizer , 1999, EUROSPEECH.

[5]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[6]  Atsunori Ogawa,et al.  Novel two-pass search strategy using time-asynchronous shortest-first second-pass beam search , 2000, INTERSPEECH.

[7]  Hermann Ney,et al.  Using SIMD instructions for fast likelihood calculation in LVCSR , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Naonori Ueda,et al.  Variational bayesian estimation and clustering for speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[9]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .