One pass cross word decoding for large vocabularies based on a lexical tree search organization

This paper describes the new Philips Research decoder that performs large vocabulary continuous speech recognition in a single pass for cross-word acoustic models and an m-gram language model (with m up to 4) as opposed to our previous technique of multiple passes. The decoder is based on a time-synchronous beam search and a prex tree structure of the lexicon. Cross-word transitions are treated dynamically. A language-model look-ahead technique is applied on the bigram probabilities. On a variety of speech data, reduced error rates are obtained together with signi cant speed-ups con rming the advantage of an early use of all available knowledge sources. In particular, the search e ort of a one-pass trigram decoding is only marginally increased compared to bigram and the integration of cross-word triphones improves the overall accuracy by typically 10% relative.

[1]  Hermann Ney,et al.  Language-model look-ahead for large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Andreas Wendemuth,et al.  The philips/RWTH system for transcription of broadcast news , 1999, EUROSPEECH.

[3]  Mei-Yuh Hwang,et al.  Improvements on the pronunciation prefix tree search organization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Hermann Ney,et al.  Large vocabulary continuous speech recognition of Wall Street Journal data , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[6]  Hermann Ney,et al.  Improvements in beam search for 10000-word continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[7]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[8]  Andreas Wendemuth,et al.  Automatic Transcription of English Broadcast News , 1998 .

[9]  Andreas Wendemuth,et al.  Acoustic Modeling in the Philips Hub-4 Continuous-Speech Recognition System , 1998 .

[10]  Hermann Ney,et al.  A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[12]  Hermann Ney,et al.  Large vocabulary continuous speech recognition using word graphs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.