Implementing and Improving MMIE Training in SphinxTrain

Discriminative training schemes, such as Maximum Mutual Information Estimation (MMIE), have been used to improve the accuracy of speech recognition systems trained using Maximum Likelihood Estimation (MLE). In this paper, we present the implementation details of MMIE training in SphinxTrain and baseline results for MMIE training on the Wall Street Journal (WSJ) SI84 and SI284 data sets. This paper also introduces an efficient lattice pruning technique that both speeds up the process and increases the impact of MMIE training on recognition accuracy. The proposed pruning technique, based on posterior probability pruning, is shown to provide better performance than MMIE using standard pruning techniques.

[1]  H. Ney,et al.  INTERDEPENDENCE OF LANGUAGE MODELS AND DISCRIMINATIVE TRAINING , 2007 .

[2]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[3]  Hermann Ney,et al.  Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[4]  Hermann Ney,et al.  Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..

[5]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[6]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[8]  Mary P. Harper,et al.  The effect of pruning and compression on graphical representations of the output of a speech recognizer , 2003, Comput. Speech Lang..

[9]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[10]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Mehryar Mohri,et al.  Weighted determinization and minimization for large vocabulary speech recognition , 1997, EUROSPEECH.