论文信息 - Evaluation of the stochastic morphosyntactic language model on a one million word hungarian dictation task

Evaluation of the stochastic morphosyntactic language model on a one million word hungarian dictation task

In this article we evaluate our stochastic morphosyntactic language model (SMLM) on a Hungarian newspaper dictation task that requires modeling over 1 million different word forms. The proposed method is based on the use of morphemes as the basic recognition units and the combination of a morpheme gram model and a morphosyntactic language model. The architecture of the recognition system is based on the weighted finite-state transducer (WFST) paradigm. Thanks to the flexible transducer-based architecture, the morphosyntactic component is integrated seamlessly with the basic modules with no need to modify the decoder itself. We compare the phoneme, morpheme, and word error-rates as well as the sizes of the recognition networks in two configurations. In one configuration we use only the -gram model while in the other we use the combined model. The proposed stochastic morphosyntactic language model decreases the morpheme error rate by between 1.7 and 7.2% relatively when compared to the baseline trigram system. The morpheme error-rate of the best configuration is 18% and the best word error-rate is 22.3%.

Sadaoki Furui | Máté Szarvas | S. Furui | M. Szarvas

[1] Stephen Cox,et al. Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[2] Sadaoki Furui,et al. Finite-state transducer based hungarian LVCSR with explicit modeling of phonological changes , 2002, INTERSPEECH.

[3] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[4] Sadaoki Furui,et al. Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..