Minimum exact word error training

In this paper we present the minimum exact word error (exactMWE) training criterion to optimise the parameters of large scale speech recognition systems. The exactMWE criterion is similar to the minimum word error (MWE) criterion, which minimises the expected word error, but uses the exact word error instead of an approximation based on time alignments as used in the MWE criterion. It is shown that the exact word error for all word sequence hypotheses can be represented on a word lattice. This can be accomplished using transducer-based methods. The result is a word lattice of slightly refined topology. The accumulated weights of each path through such a lattice then represent the exact number of word errors for the corresponding word sequence hypothesis. Using this compressed representation of the word error of all word sequences represented in the original lattice, exactMWE can be performed using the same lattice-based re-estimation process as for MWE training. First experiments on the Wall Street Journal dictation task do not show significant differences in recognition performance between exactMWE and MWE at comparable computational complexity and convergence behaviour of the training

[1]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Jonathan G. Fiscus,et al.  Benchmark Tests for the DARPA Spoken Language Program , 1993, HLT.

[3]  Hermann Ney,et al.  Investigations on error minimizing training criteria for discriminative training in automatic speech recognition , 2005, INTERSPEECH.

[4]  Mehryar Mohri Edit-Distance Of Weighted Automata: General Definitions And Algorithms , 2003, Int. J. Found. Comput. Sci..

[5]  Cyril Allauzen,et al.  Efficient Algorithms for Testing the Twins Property , 2003, J. Autom. Lang. Comb..

[6]  William J. Byrne,et al.  Pinched lattice minimum Bayes risk discriminative training for large vocabulary continuous speech recognition , 2004, INTERSPEECH.

[7]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hermann Ney,et al.  FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation , 2004, ACL.

[9]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Zdravko Kacic,et al.  A novel loss function for the overall risk criterion based discriminative training of HMM models , 2000, INTERSPEECH.

[12]  Marcel Paul Schützenberger,et al.  Sur une Variante des Fonctions Sequentielles , 1977, Theor. Comput. Sci..

[13]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.