Comparison of subspace methods for Gaussian mixture models in speech recognition

Speech recognizers typically use high-dimensional feature vectors to capture the essential cues for speech recognition purposes. The acoustics are then commonly modeled with a Hidden Markov Model with Gaussian Mixture Models as observation probability density functions. Using unrestricted Gaussian parameters might lead to intolerable model costs both evaluationand storagewise, which limits their practical use only to some high-end systems. The classical approach to tackle with these problems is to assume independent features and constrain the covariance matrices to being diagonal. This can be thought as constraining the second order parameters to lie in a fixed subspace consisting of rank-1 terms. In this paper we discuss the differences between recently proposed subspace methods for GMMs with emphasis placed on the applicability of the models to a practical LVCSR system.

[1]  Scott Axelrod,et al.  Modeling with a subspace constraint on inverse covariance matrices , 2002, INTERSPEECH.

[2]  Vesa Siivola,et al.  Growing an n-gram language model , 2005, INTERSPEECH.

[3]  Scott Axelrod,et al.  Subspace constrained Gaussian mixture models for speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[4]  Ananth Sankar,et al.  Mixtures of inverse covariances , 2003, IEEE Transactions on Speech and Audio Processing.

[5]  Janne Pylkkönen LDA based feature estimation methods for LVCSR , 2006, INTERSPEECH.

[6]  Peder A. Olsen,et al.  Initializing subspace constrained Gaussian mixture models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[8]  Daniel Povey SPAM and full covariance for speech recognition , 2006, INTERSPEECH.

[9]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[10]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[11]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition with morph language models applied to Finnish , 2006, Comput. Speech Lang..

[12]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[13]  Peder A. Olsen,et al.  Fast clustering of Gaussians and the virtue of representing Gaussians in exponential model format , 2004, INTERSPEECH.