论文信息 - Discriminative mixture weight estimation for large Gaussian mixture models

Discriminative mixture weight estimation for large Gaussian mixture models

This paper describes a new approach to acoustic modeling for large vocabulary continuous speech recognition (LVCSR) systems. Each phone is modeled with a large Gaussian mixture model (GMM) whose context-dependent mixture weights are estimated with a sentence-level discriminative training criterion. The estimation problem is cast in a neural network framework, which enables the incorporation of the appropriate constraints on the mixture weight vectors, and allows a straight-forward training procedure, based on steepest descent. Experiments conducted on the Callhome-English and Switchboard databases show a significant improvement of the acoustic model performance, and a somewhat lesser improvement with the combined acoustic and language models.

[1] Peter F. Brown,et al. The acoustic-modeling problem in automatic speech recognition , 1987 .

[2] Vassilios Digalakis,et al. Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..

[3] Biing-Hwang Juang,et al. New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[4] Yochai Konig,et al. REMAP: recursive estimation and maximization of a posteriori probabilities in connectionist speech recognition , 1994, EUROSPEECH.

[5] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .

[7] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[8] Y.-L. Chow. Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[9] F. Beaufays,et al. DYNAMO: An Algorithm for Dynamic Acoustic Modeling , 1998 .