Slovenian large vocabulary speech recognition with data-driven models of inflectional morphology

The paper describes experiments in large vocabulary speech recognition of the highly inflective Slovenian language. The main problem of an inflective language is its high OOV (out-of-vocabulary) rate. To achieve a usable OOV rate, smaller modeling units (namely stems and endings) are used instead of words. Word decompositions are based on data-driven methods. Experiments with different-sized vocabularies were performed to show the effects of data sparsity and acoustic confusability. The most remarkable improvement is obtained with a vocabulary of 20,000 units. We compare subword-based models with word-based models. All results are computed on word level. The best results are obtained with subword trigram language models. They improve recognition for 7.5%. By using larger vocabularies, the results are not improved. The problems of acoustic confusability of subword units becomes evident. Also, the statistics of some modelling units are poorly estimated due to their low frequency of occurrence.

[1]  William J. Byrne,et al.  On large vocabulary continuous speech recognition of highly inflectional language - czech , 2001, INTERSPEECH.

[2]  Mirjam Sepesy Mau,et al.  Topic Detection for Language Model Adaptation of Highly-Inflected Languages by Using a Fuzzy Comparison Function , 2001 .

[3]  M Huckvale,et al.  Using phonologically-constrained morphological analysis in speech recognition , 2002 .

[4]  Mirjam Sepesy Maucec,et al.  Topic detection for language model adaptation of highly-inflected languages by using a fuzzy comparison function , 2001, INTERSPEECH.

[5]  Zdravko Kacic,et al.  Issues in Design and Collection of Large Telephone Speech Corpus for Slovenian Language , 2000, LREC.

[6]  Mark Huckvale,et al.  Using phonologically-constrained morphological analysis in continuous speech recognition , 2002, Comput. Speech Lang..

[7]  Mirjam Sepesy Maucec,et al.  A comparison of HTK, ISIP and julius in slovenian large vocabulary continuous speech recognition , 2002, INTERSPEECH.

[8]  Joseph Picone,et al.  Syllable-based large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[9]  Alex Waibel,et al.  TRANSCRIBING MULTILINGUAL BROADCAST NEWS USING HYPOTHESIS DRIVEN LEXICAL ADAPTATION , 1998 .

[10]  Ludmila Uhlírová On Language Modelling in Automatic Speech Recognition , 2000, J. Quant. Linguistics.

[11]  Jianfeng Gao,et al.  The Use of Clustering Techniques for Language Modeling V Application to Asian Language , 2001, ROCLING/IJCLCLP.

[12]  Xavier L. Aubert,et al.  An overview of decoding techniques for large vocabulary continuous speech recognition , 2002, Comput. Speech Lang..

[13]  N. Deshmukh,et al.  HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION1 , 1999 .

[14]  Philip C. Woodland,et al.  Comparison of language modelling techniques for Russian and English , 1998, ICSLP.

[15]  Dietrich Klakow,et al.  Speech recognition for huge vocabularies by using optimized sub-word units , 2001, INTERSPEECH.

[16]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.