Likelihood-based non-uniform allocation of Gaussian kernels in scalar dimension for HMM compression

A new, likelihood-based non-uniform allocation of Gaussian kernels in scalar (feature) dimension is proposed to compress complex, Gaussian mixture-based, continuous density HMMs into computationally efficient, small footprint models. Different from the objective of the previously proposed Kullback-Leibler divergence-based (KLD-based) allocation (Li et al., 2005), which is to make a better representation of the original model, the objective of the likelihood-based approach is to make the current compressed model be a better representation of the training data. It is implemented based on the unequal likelihood contributions of different features with uniform representation resolutions. Our experiments on the resource management database show that likelihood-based allocation outperforms uniform allocation and KLD-based non-uniform allocation due to its better representation of the training data.

[1]  Frank K. Soong,et al.  Optimal clustering and non-uniform allocation of Gaussian kernels in scalar dimension for HMM compression [speech recognition applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Jay G. Wilpon,et al.  Discriminative analysis for feature reduction in automatic speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[4]  Biing-Hwang Juang,et al.  Optimal quantization of LSP parameters , 1993, IEEE Trans. Speech Audio Process..

[5]  Brian Kan-Wing Mak,et al.  Subspace distribution clustering hidden Markov model , 2001, IEEE Trans. Speech Audio Process..

[6]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[7]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[8]  Steve J. Young,et al.  Tree-Based State Tying for High Accuracy Modelling , 1994, HLT.

[9]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[10]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[11]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[12]  Satoshi Takahashi,et al.  Four-level tied-structure for efficient representation of acoustic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[13]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[14]  Frank K. Soong,et al.  Hidden Markov models with divergence based vector quantized variances , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).