Minimum Phone Error and I-smoothing for improved discriminative training

In this paper we introduce the Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria for the discriminative training of HMM systems. The MPE/MWE criteria are smoothed approximations to the phone or word error rate respectively. We also discuss I-smoothing which is a novel technique for smoothing discriminative training criteria using statistics for maximum likelihood estimation (MLE). Experiments have been performed on the Switchboard/Call Home corpora of telephone conversations with up to 265 hours of training data. It is shown that for the maximum mutual information estimation (MMIE) criterion, I-smoothing reduces the word error rate (WER) by 0.4% absolute over the MMIE baseline. The combination of MPE and I-smoothing gives an improvement of 1 % over MMIE and a total reduction in WER of 4.8% absolute over the original MLE system.

[1]  D. Matula,et al.  Foundations of Finite Precision Rational Arithmetic , 1980 .

[2]  Peter Kornerup,et al.  Finite Precision Rational Arithmetic: An Arithmetic Unit , 1983, IEEE Transactions on Computers.

[3]  Warren E. Ferguson,et al.  Rationally biased arithmetic , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[4]  Peter Kornerup,et al.  Finite Precision Rational Arithmetic: Slash Number Systems , 1983, IEEE Transactions on Computers.

[5]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  A. Nadas,et al.  Decoder selection based on cross-entropies , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[8]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[9]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[10]  Günther Ruske,et al.  Discriminative training for continuous speech recognition , 1995, EUROSPEECH.

[11]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[12]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[13]  Ralf Schlüter,et al.  Investigations on discriminative training criteria , 2000 .

[14]  Daniel Povey,et al.  Improved discriminative training techniques for large vocabulary continuous speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..