In this article we evaluate our stochastic morphosyntactic language model (SMLM) on a Hungarian newspaper dictation task that requires modeling over 1 million different word forms. The proposed method is based on the use of morphemes as the basic recognition units and the combination of a morpheme gram model and a morphosyntactic language model. The architecture of the recognition system is based on the weighted finite-state transducer (WFST) paradigm. Thanks to the flexible transducer-based architecture, the morphosyntactic component is integrated seamlessly with the basic modules with no need to modify the decoder itself. We compare the phoneme, morpheme, and word error-rates as well as the sizes of the recognition networks in two configurations. In one configuration we use only the -gram model while in the other we use the combined model. The proposed stochastic morphosyntactic language model decreases the morpheme error rate by between 1.7 and 7.2% relatively when compared to the baseline trigram system. The morpheme error-rate of the best configuration is 18% and the best word error-rate is 22.3%.
[1]
Stephen Cox,et al.
Some statistical issues in the comparison of speech recognition algorithms
,
1989,
International Conference on Acoustics, Speech, and Signal Processing,.
[2]
Sadaoki Furui,et al.
Finite-state transducer based hungarian LVCSR with explicit modeling of phonological changes
,
2002,
INTERSPEECH.
[3]
Fernando Pereira,et al.
Weighted finite-state transducers in speech recognition
,
2002,
Comput. Speech Lang..
[4]
Sadaoki Furui,et al.
Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR
,
2003,
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..