Discriminative estimation of subspace precision and mean (SPAM) models

The SPAM model was recently proposed as a very general method for modeling Gaussians with constrained means and covariances. It has been shown to yield significant error rat e improvements over other methods of constraining covarianc es such as diagonal, semi-tied covariances, and extended maxi mum likelihood linear transformations. In this paper we address the problem of discriminative estimation of SPAM mode l parameters, in an attempt to further improve its performanc e. We present discriminative estimation under two criteria: m aximum mutual information (MMI) and an “error-weighted” train ing. We show that both these methods individually result in over 20% relative reduction in word error rate on a digit task over maximum likelihood (ML) estimated SPAM model parameters. We also show that a gain of as much as 28% relative can be achieved by combining these two discriminative estimati on techniques. The techniques developed in this paper also app ly directly to an extension of SPAM called subspace constraine d exponential models.

[1]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[6]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[7]  Scott Axelrod,et al.  Acoustic modeling with mixtures of subspace constrained exponential models , 2003, INTERSPEECH.

[8]  Scott Axelrod,et al.  Dimensional reduction, covariance modeling, and computational complexity in ASR systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[10]  William J. Byrne,et al.  Discriminative speaker adaptation with conditional maximum likelihood linear regression , 2001, INTERSPEECH.

[11]  Lalit R. Bahl,et al.  Estimating hidden Markov model parameters so as to maximize speech recognition accuracy , 1993, IEEE Trans. Speech Audio Process..

[12]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .