Performance of LVCSR with morpheme-based and syllable-based recognition units

For large vocabulary continuous speech recognition of highly inflected languages, it is the first step to determine an appropriate speech recognition unit to reduce high out-of-vocabulary rate. We investigate two kinds of approaches to select recognition units. In the morpheme-based approach, we use morpheme as basic recognition unit and merge frequent morpheme pairs into phrases by rule-based method or statistical unit merging method. In statistical unit merging, we investigate the effects of part-of-speech constraints used in selecting merging candidates. In the syllable-based approach, assuming that only text data and pronunciation are available, we obtain merged syllables by using the same statistical merging method where pronunciation variation is taken into account. The experimental results showed that the statistical merging method with appropriate linguistic constraints yields best recognition accuracy. Although the syllable-based approach did not show comparable performance, it has the advantage that it does not require a part-of-speech tagging system.

[1]  김재훈,et al.  오류-보정 기법을 이용한 어휘 모호성 해소 = Lexical disambiguation with error-driven learning , 1996 .

[2]  Kyuwoong Hwang,et al.  Automatic generation of Korean pronunciation variants by multistage applications of phonological rules , 1998, ICSLP.

[3]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[4]  Petra Geutner,et al.  Using morphology towards better large-vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Alexander H. Waibel,et al.  Class phrase models for language modeling , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Hong-Kwang Jeff Kuo,et al.  Phrase-based language models for speech recognition , 1999, EUROSPEECH.

[7]  Oh-Wook Kwon,et al.  Korean large vocabulary continuous speech recognition using pseudomorpheme units , 1999, EUROSPEECH.

[8]  Klaus Ries,et al.  The Karlsruhe-Verbmobil speech recognition engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Dietrich Klakow Language-model optimization by mapping of corpora , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Alexander H. Waibel,et al.  Serbo-Croatian LVCSR on the dictation and broadcast news domain , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  Klaus Ries,et al.  An automatic method for learning a Japanese lexicon for recognition of spontaneous speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).