A new based distance language model for a dictation machine: application to MAUD
暂无分享,去创建一个
This paper deals with the use of a stochastic language model based on the split of the words history into d words where d is the length of the history. One of our aims is to modelise the semantic and syntactic relationships between words. This model can be considered as a first step for this goal. We experimented our model through the Shannon game (on 10 000 truncated sentences) and implemented it in MAUD, our dictation machine. Tests on MAUD have been done on 300 sentences pronounced by several women and men. This model predicts more words (in the Shannon game) than any other methods we developed before in our team. However, these models are sophisticated in contrast to the one we describe. Moreover, when including unknown words, the results are better than the model ones we presented in a recent work in terms of mean rank, ranks from 1 to 5 and perplexity. This work has needed to use two interpolation methods inspired from Markov model. Also, we discuss the problem of the unknown word modelling.
[1] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.
[2] Kamel Smaïli,et al. A first evaluation campaign for language models , 1998 .
[3] Jean-François Mari,et al. Towards an oral interface for data entry: The MAUD System , 1997 .
[4] Mei-Yuh Hwang,et al. The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..