Smoothed language model incorporation for efficient time-synchronous beam search decoding in LVCSR

For performing the decoding search in large vocabulary continuous speech recognition (LVCSR) with hidden Markov models (HMM) and statistical language models, the most straightforward and popular approach is the time-synchronous beam search procedure. A drawback of this approach is that the time-asynchrony of the language model weight application during search leads to performance degradations. This is particularly so when performing the search with a tight pruning beam. This study presents a method for smoothing the language model within the recognition network. The optimization goal is the smearing of transition probabilities from HMM state to HMM state in favor of a more time-synchronous language model weight application. In addition, state-based language model look-ahead is proposed and evaluated. Both language model smoothing techniques lead to a remarkable improvement in accuracy-to-run-time ratio, while their combined application yields only limited improvements.