Minimizing search errors due to delayed bigrams in real-time speech recognition systems

When building applications from large vocabulary speech recognition systems, a certain amount of search errors due to pruning often has to be accepted in order to obtain the required speed. We tackle the problems resulting from aggressive pruning strategies as typically applied in large vocabulary systems to achieve close to real-time performance. We consider a typical scenario of a two pass Viterbi search with the first pass being organized as a phoneme (allophone) tree. For such a tree organized lexicon, there are two possibilities to use a bigram language model: either by building tree copies or by using so-called delayed bigrams. Since copying trees turns out to be too expensive for real time applications we basically refer to delayed bigrams, discuss their drastic influence on the word accuracy and show how to alleviate the disastrous effect of delayed bigrams under aggressive pruning.

[1]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Hermann Ney,et al.  Large vocabulary continuous speech recognition using word graphs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.