Traditional stochastic language models for speech recognition (i.e. n-grams) are deterministic, in the sense that there is one and only one derivation for each given sentence. Moreover a fixed temporal window is always assumed in the estimation of the traditional stochastic language models. This paper shows how non-determinism is introduced to effectively approximate a back-off n-gram language model through a finite state network formalism. It also shows that a new flexible and powerful network formalization can be obtained by releasing the assumption of a fixed history size. As a result, a class of automata for language modeling (variable n-gram stochastic automata) is obtained, for which we propose some methods for the estimation of the transition probabilities. VNSAs have been used in a spontaneous speech recognizer for the ATIS task. The accuracy on a standard test set is presented.
[1]
Ian H. Witten,et al.
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
,
1991,
IEEE Trans. Inf. Theory.
[2]
Robert L. Mercer,et al.
Class-Based n-gram Models of Natural Language
,
1992,
CL.
[3]
Pascale Fung,et al.
The estimation of powerful language models from small and large corpora
,
1993,
1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[4]
Hermann Ney,et al.
Estimating 'small' probabilities by leaving-one-out
,
1993,
EUROSPEECH.