On morph-based LVCSR improvements

Efficient large vocabulary continuous speech recognition of morphologically rich languages is a big challenge due to the rapid vocabulary growth. To improve the results various subword units called as morphs are applied as basic language elements. The improvements over the word baseline, however, are changing from negative to error rate halving across languages and tasks. In this paper we make an attempt to explore the source of this variability. Different LVCSR tasks of an agglutinative language are investigated in numerous experiments using full vocabularies. The improvement results are compared to pre-existing other language results, as well. Important correlations are found between the morph-based improvements and between the vocabulary growths and the corpus sizes. Index Terms — speech recognition, rich morphology, morph, language modeling, LVCSR

[1]  Máté Szarvas,et al.  Automatic Recognition of Hungarian: Theory And Practice , 2000, Int. J. Speech Technol..

[2]  Mathias Creutz,et al.  INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT , 2005 .

[3]  Oh-Wook Kwon,et al.  Korean large vocabulary continuous speech recognition with morpheme-based recognition units , 2003, Speech Commun..

[4]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Ebru Arisoy,et al.  Unlimited vocabulary speech recognition for agglutinative languages , 2006, NAACL.

[6]  Tibor Fegyó,et al.  Towards Automatic Transcription of Large Spoken Archives in Agglutinating Languages - Hungarian ASR for the MALACH Project , 2007, TSD.

[7]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[8]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9]  Mikko Kurimo,et al.  Importance of High-Order N-Gram Models in Morph-Based Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Tibor Fegyó,et al.  Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  András Kornai,et al.  Hunmorph: Open Source Word Analysis , 2005, ACL 2005.

[13]  Geoffrey Zweig,et al.  Morpheme-Based Language Modeling for Arabic Lvcsr , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[15]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[16]  Laurent Mauuary,et al.  Blind equalization for robust telephone based speech recognition , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[17]  Tibor Fegyó,et al.  A morpho-graphemic approach for the recognition of spontaneous speech in agglutinative languages - like Hungarian , 2007, INTERSPEECH.

[18]  Tibor Fegyó,et al.  Investigation of morph-based speech recognition improvements across speech genres , 2009, INTERSPEECH.

[19]  Ebru Arisoy,et al.  Morph-based speech recognition and modeling of out-of-vocabulary words across languages , 2007, TSLP.

[20]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[21]  M. Kurimo,et al.  Decoder issues in unlimited finnish speech recognition , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[22]  Steve Young,et al.  The HTK book , 1995 .

[23]  Ruhi Sarikaya,et al.  On the use of morphological analysis for dialectal Arabic speech recognition , 2006, INTERSPEECH.