Dimensional reduction, covariance modeling, and computational complexity in ASR systems

We study acoustic modeling for speech recognition using mixtures of exponential models with linear and quadratic features tied across all context dependent states. These models are one version of the SPAM models introduced by Axelrod, Gopinath and Olsen (see Proc. ICSLP, 2002). They generalize diagonal covariance, MLLT, EMLLT, and full covariance models. Reduction of the dimension of the acoustic vectors using LDA/HDA projections corresponds to a special case of reducing the exponential model feature space. We see, in one speech recognition task, that SPAM models on an LDA projected space of varying dimensions achieve a significant fraction of the WER improvement in going from MLLT to full covariance modeling, while maintaining the low computational cost of the MLLT models. Further, the feature precomputation cost can be minimized using the hybrid feature technique of Visweswariah, Olsen, Gopinath and Axelrod (see ICASSP 2003); and the number of Gaussians one needs to compute can be greatly reducing using hierarchical clustering of the Gaussians (with fixed feature space). Finally, we show that reducing the quadratic and linear feature spaces separately produces models with better accuracy, but comparable computational complexity, to LDA/HDA based models.

[1]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Scott Axelrod,et al.  Modeling with a subspace constraint on inverse covariance matrices , 2002, INTERSPEECH.

[4]  Scott Axelrod,et al.  Maximum likelihood training of subspaces for inverse covariance modeling , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[6]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[7]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[8]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).