Large vocabulary decoding and confidence estimation using word posterior probabilities

The paper investigates the estimation of word posterior probabilities based on word lattices and presents applications of these posteriors in a large vocabulary speech recognition system. A novel approach to integrating these word posterior probability distributions into a conventional Viterbi decoder is presented. The problem of the robust estimation of confidence scores from word posteriors is examined and a method based on decision trees is suggested. The effectiveness of these techniques is demonstrated on the broadcast news and the conversational telephone speech corpora where improvements both in terms of word error rate and normalised cross entropy were achieved compared to the baseline HTK evaluation systems.

[1]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mitch Weintraub,et al.  Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[3]  Hermann Ney,et al.  A comparison of word graph and n-best list based confidence measures , 1999, EUROSPEECH.

[4]  Thomas Hain,et al.  The 1997 HTK broadcast news transcription system , 1998 .

[5]  Peter Regel-Brietzmann,et al.  Word graph rescoring using confidence measures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Ralf Schlüter,et al.  Using word probabilities as confidence measures , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Thomas Niesler,et al.  The 1998 HTK system for transcription of conversational telephone speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[9]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[10]  Gunnar Evermann,et al.  Minimum Word Error Rate Decoding , 1999 .