Non-deterministic stochastic language models for speech recognition

Traditional stochastic language models for speech recognition (i.e. n-grams) are deterministic, in the sense that there is one and only one derivation for each given sentence. Moreover a fixed temporal window is always assumed in the estimation of the traditional stochastic language models. This paper shows how non-determinism is introduced to effectively approximate a back-off n-gram language model through a finite state network formalism. It also shows that a new flexible and powerful network formalization can be obtained by releasing the assumption of a fixed history size. As a result, a class of automata for language modeling (variable n-gram stochastic automata) is obtained, for which we propose some methods for the estimation of the transition probabilities. VNSAs have been used in a spontaneous speech recognizer for the ATIS task. The accuracy on a standard test set is presented.