Modeling with a subspace constraint on inverse covariance matrices

We consider a family of Gaussian mixture models for use in HMM based speech recognition system. These “SPAM” models have state independent choices of subspaces to which the precision (inverse covariance) matrices and means are restricted to belong. They provide a flexible tool for robust, compact, and fast acoustic modeling. The focus of this paper is on the case where the means are unconstrained. The models in the case already generalize the recently introduced EMLLT models, which themselves interpolate between MLLT and full covariance models. We describe an algorithm to train both the state-dependent and state-independent parameters. Results are reported on one speech recognition task. The SPAM models are seen to yield significant improvements in accuracy over EMLLT models with comparable model size and runtime speed. We find a relative reduction in error rate over an MLLT model can be obtained while decreasing the acoustic modeling time by .

[1]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[3]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[4]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  G AndreouAndreas,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998 .

[7]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[8]  Ramesh A. Gopinath,et al.  Low-Resource Speech Recognition of 500-Word Vocabularies , 2001 .

[9]  E. Polak,et al.  Computational methods in optimization : a unified approach , 1972 .