Improved covariance modeling for maximum likelihood multiple subspace transformations [speech recognition applications]

Maximum likelihood (ML) multiple subspace transformation algorithms, such as semi-tied covariance (STC) and multiple heteroscedastic linear discriminant analysis (HLDA), have achieved significant improvement. In STC and multiple HLDA, all the Gaussian components are classified as multiple components sets. In each set, Gaussian components' full covariance, which is estimated by the ML criterion, is used to estimate the linear transformation of this set. However, the full covariance matrix, which contains a large number of free parameters, may not be reliably estimated by the ML criterion. Unreliable full covariance will lead to unreliable linear transformation, and will finally lead to poor recognition results. There have been several algorithms proposed to reliably estimate the full covariance, such as mixture of inverse covariance (MIC), SPAM, and hierarchical correlation compensation (HCC). In this paper, we combine HCC with STC and multiple HLDA. Experiments show that standard STC can achieve 12.47% word error rate (WER) reduction on the RM database, while our HCC+STC can achieve 19.32% WER reduction.

[1]  Hui Lin,et al.  Hierarchical correlation compensation for hidden Markov models [speech recognition applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Mark J. F. Gales Maximum likelihood multiple subspace projections for hidden Markov models , 2002, IEEE Trans. Speech Audio Process..

[3]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[5]  Ananth Sankar,et al.  Mixtures of inverse covariances , 2004, IEEE Trans. Speech Audio Process..

[6]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[7]  Ananth Sankar,et al.  Mixtures of inverse covariances , 2003, IEEE Transactions on Speech and Audio Processing.

[8]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[9]  Scott Axelrod,et al.  Modeling with a subspace constraint on inverse covariance matrices , 2002, INTERSPEECH.

[10]  Peder A. Olsen,et al.  Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[11]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..