Subspace constrained Gaussian mixture models for speech recognition

A standard approach to automatic speech recognition uses hidden Markov models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are constrained to lie in an affine subspace shared by all the Gaussians. This class of models includes Gaussian models with linear constraints placed on the precision (inverse covariance) matrices (such as diagonal covariance, maximum likelihood linear transformation, or extended maximum likelihood linear transformation), as well as the LDA/HLDA models used for feature selection which tie the part of the Gaussians in the directions not used for discrimination. In this paper, we present algorithms for training these models using a maximum likelihood criterion. We present experiments on both small vocabulary, resource constrained, grammar-based tasks, as well as large vocabulary, unconstrained resource tasks to explore the rather large parameter space of models that fit within our framework. In particular, we demonstrate significant improvements can be obtained in both word error rate and computational complexity.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  R. Fisher THE STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS , 1938 .

[3]  N. L. Johnson,et al.  Linear Statistical Inference and Its Applications , 1966 .

[4]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5]  Boris Polyak,et al.  The method of projections for finding the common point of convex sets , 1967 .

[6]  E. Polak,et al.  Computational methods in optimization : a unified approach , 1972 .

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[9]  N. Campbell CANONICAL VARIATE ANALYSIS—A GENERAL MODEL FORMULATION , 1984 .

[10]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[11]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[12]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[14]  Michael Picheny,et al.  Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[16]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[17]  Bhuvana Ramabhadran,et al.  Factor analysis invariant to linear transformations of data , 1998, ICSLP.

[18]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[19]  Stephen P. Boyd,et al.  Determinant Maximization with Linear Matrix Inequality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[20]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[22]  Ramesh A. Gopinath,et al.  Model selection in acoustic modeling , 1999, EUROSPEECH.

[23]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[24]  Lawrence K. Saul,et al.  Maximum likelihood and minimum classification error factor analysis for automatic speech recognition , 2000, IEEE Trans. Speech Audio Process..

[25]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[26]  Karl Meerbergen,et al.  The Quadratic Eigenvalue Problem , 2001, SIAM Rev..

[27]  Ramesh A. Gopinath,et al.  Low-Resource Speech Recognition of 500-Word Vocabularies , 2001 .

[28]  Scott Axelrod,et al.  Modeling with a subspace constraint on inverse covariance matrices , 2002, INTERSPEECH.

[29]  Scott Axelrod,et al.  Discriminative estimation of subspace precision and mean (SPAM) models , 2003, INTERSPEECH.

[30]  Brian Kingsbury,et al.  Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices , 2003, INTERSPEECH.

[31]  Scott Axelrod,et al.  Maximum likelihood training of subspaces for inverse covariance modeling , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[32]  Geoffrey Zweig,et al.  Toward domain-independent conversational speech recognition , 2003, INTERSPEECH.

[33]  Scott Axelrod,et al.  Dimensional reduction, covariance modeling, and computational complexity in ASR systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[34]  Scott Axelrod,et al.  Acoustic modeling with mixtures of subspace constrained exponential models , 2003, INTERSPEECH.

[35]  Karthik Visweswariah,et al.  Covariance and precision modeling in shared multiple subspaces , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[36]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[37]  Ananth Sankar,et al.  Mixtures of inverse covariances , 2003, IEEE Transactions on Speech and Audio Processing.