Modeling inverse covariance matrices by basis expansion

This paper proposes a new covariance modeling technique for Gaussian mixture models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., /spl Sigma//sub j//sup -1/=P/sub j/=/spl Sigma//sub k=1//sup D//spl lambda//sub k//sup j/a/sub k/a/sub k//sup T/, /spl lambda//sub k//sup j//spl isin//spl Ropf/,a/sub k//spl isin//spl Ropf//sup d/. A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set {a/sub k/a/sub k//sup T/}/sub k=1//sup D/ and the expansion coefficients {/spl lambda//sub k//sup j/}. This model, called the extended maximum likelihood linear transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from D=d to D=d(d+1)/2 one gradually moves from a maximum likelihood linear transform (MLLT) model to a full-covariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.

[1]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  Mark J. F. Gales,et al.  Factored Semi-Tied Covariance Matrices , 2000, NIPS.

[4]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[5]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[6]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[7]  Ramesh A. Gopinath,et al.  Multiple linear transforms , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Scott Axelrod,et al.  Maximum likelihood training of subspaces for inverse covariance modeling , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[10]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[11]  Gene H. Golub,et al.  Matrix computations , 1983 .

[12]  Scott Axelrod,et al.  Dimensional reduction, covariance modeling, and computational complexity in ASR systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Bhuvana Ramabhadran,et al.  Factor analysis invariant to linear transformations of data , 1998, ICSLP.

[14]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Daniel Jurafsky,et al.  An introduction to natural language processing , 2000 .

[16]  Peter Lancaster,et al.  The theory of matrices , 1969 .

[17]  Scott Axelrod,et al.  Subspace constrained Gaussian mixture models for speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[18]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[19]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[20]  Scott Axelrod,et al.  Modeling with a subspace constraint on inverse covariance matrices , 2002, INTERSPEECH.

[21]  C. Manogue,et al.  Dimensional Reduction , 1998, hep-th/9807044.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Peder A. Olsen,et al.  Extended MLLT for Gaussian Mixture Models , 2001 .

[24]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[25]  P. Ladefoged A course in phonetics , 1975 .

[26]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[27]  Ramesh A. Gopinath,et al.  Low-Resource Speech Recognition of 500-Word Vocabularies , 2001 .

[28]  Karthik Visweswariah,et al.  Covariance and precision modeling in shared multiple subspaces , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[29]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[30]  Lawrence K. Saul,et al.  Maximum likelihood and minimum classification error factor analysis for automatic speech recognition , 2000, IEEE Trans. Speech Audio Process..

[31]  J. Jensen Sur les fonctions convexes et les inégalités entre les valeurs moyennes , 1906 .