An hybrid language model for a continuous dictation prototype

This paper describes the combination of a stochastic language model and a formal grammar modeled such as a unification grammar. The stochastic model is trained over 42 million words extracted from Le Monde newspaper. The stochastic model is based on smoothed 3-gram and 3-class. The 3-class model is represented by a Markov chain made up of four states. Several experiments have been done to state which values are the best for specific training and test corpus. Experiments indicate that the unification grammar reduces strongly the number of hypothesis (sentences) produced by the stochastic model.

[1]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[2]  Abdelaziz Kriouile,et al.  Some improvements in speech recognition algorithms based on HMM , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[4]  François Charpillet,et al.  A new algorithm for Automatic Word Classification based on an Improved Simulated Annealing Technique , 1996 .

[5]  Jean-François Mari,et al.  A second-order HMM for high performance word and phoneme-based continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.