Dynamic network decoding revisited

We present a dynamic network decoder capable of using large cross-word context models and large n-gram histories. Our method for constructing the search network is designed to process large cross-word context models very efficiently and we address the optimization of the search network to minimize any overhead during run-time for the dynamic network decoder. The search procedure uses the full LM history for lookahead, and path recombination is done as early as possible. In our systematic comparison to a static FSM based decoder, we find the dynamic decoder can run at comparable speed as the static decoder when large language models are used, while the static decoder performs best for small language models. We discuss the use of very large vocabularies of up to 2.5 million words for both decoding approaches and analyze the effect of weak acoustic models for pruning.

[1]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  K. K. Chin,et al.  Efficient language model look-ahead probabilities generation using lower order LM look-ahead information , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Giuliano Antoniol,et al.  Language model representations for beam-search decoding , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Hermann Ney,et al.  Improved lexical tree search for large vocabulary speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[6]  Mei-Yuh Hwang,et al.  Improvements on the pronunciation prefix tree search organization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Hagen Soltau,et al.  Efficient language model lookahead through polymorphic linguistic context assignment , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Stanley F. Chen Compiling large-context phonetic decision trees into finite-state transducers , 2003, INTERSPEECH.

[9]  Hermann Ney,et al.  A comparison of two LVR search optimization techniques , 2002, INTERSPEECH.

[10]  Geoffrey Zweig,et al.  Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.

[11]  Xavier L. Aubert,et al.  One pass cross word decoding for large vocabularies based on a lexical tree search organization , 1999, EUROSPEECH.