论文信息 - Minimum Phone Error and I-smoothing for improved discriminative training

Minimum Phone Error and I-smoothing for improved discriminative training

In this paper we introduce the Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria for the discriminative training of HMM systems. The MPE/MWE criteria are smoothed approximations to the phone or word error rate respectively. We also discuss I-smoothing which is a novel technique for smoothing discriminative training criteria using statistics for maximum likelihood estimation (MLE). Experiments have been performed on the Switchboard/Call Home corpora of telephone conversations with up to 265 hours of training data. It is shown that for the maximum mutual information estimation (MMIE) criterion, I-smoothing reduces the word error rate (WER) by 0.4% absolute over the MMIE baseline. The combination of MPE and I-smoothing gives an improvement of 1 % over MMIE and a total reduction in WER of 4.8% absolute over the original MLE system.

Daniel Povey | Philip C. Woodland | P. Woodland | Daniel Povey

[1] D. Matula,et al. Foundations of Finite Precision Rational Arithmetic , 1980 .

[2] Peter Kornerup,et al. Finite Precision Rational Arithmetic: An Arithmetic Unit , 1983, IEEE Transactions on Computers.

[3] Warren E. Ferguson,et al. Rationally biased arithmetic , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[4] Peter Kornerup,et al. Finite Precision Rational Arithmetic: Slash Number Systems , 1983, IEEE Transactions on Computers.

[5] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] A. Nadas,et al. Decoder selection based on cross-entropies , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7] Dimitri Kanevsky,et al. An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[8] Yves Normandin,et al. Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[9] Shigeru Katagiri,et al. Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[10] Günther Ruske,et al. Discriminative training for continuous speech recognition , 1995, EUROSPEECH.

[11] Biing-Hwang Juang,et al. Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[12] Steve J. Young,et al. MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[13] Ralf Schlüter,et al. Investigations on discriminative training criteria , 2000 .

[14] Daniel Povey,et al. Improved discriminative training techniques for large vocabulary continuous speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15] Daniel Povey,et al. Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..