With this paper, we present the DUcoder, the LVCSR decoder developed at Duisburg University. The decoder performs the Viterbi search for the most probable word sequence in recognition systems that make use of HMMs and backoff N-gram language models. In principle, the decoding strategy is similar to the one of the so-called stackdecoders. During the development of the decoder, emphasis has been laid upon innovations for rapidly speeding up decoding by carefully performing approximations. Besides a brief presentation of the decoder's overall design, this paper points out the crucial issues with respect to speed and recognition performance. Evaluations are carried out on a German LVCSR system with a vocabulary of 100000 words, word-internal triphones and a trigram language model. Close-to-real-time performance is achieved with 12% additional error while a decoder configuration which runs in around 40 times real-time causes no search error on the evaluations set.
[1]
Mike Schuster.
Nozomi - a fast, memory-efficient stack decoder for LVCSR
,
1998,
ICSLP.
[2]
Lalit R. Bahl,et al.
A tree search strategy for large-vocabulary continuous speech recognition
,
1995,
1995 International Conference on Acoustics, Speech, and Signal Processing.
[3]
Douglas B. Paul,et al.
An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model
,
1992,
HLT.
[4]
Steve Young,et al.
Token passing: a simple conceptual model for connected speech recognition systems
,
1989
.
[5]
Christoph Neukirchen,et al.
Efficient search with posterior probability estimates in HMM-based speech recognition
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).