Improved Recognition of Spontaneous Hungarian Speech—Morphological and Acoustic Modeling Techniques for a Less Resourced Task

Various morphological and acoustic modeling techniques are evaluated on a less resourced, spontaneous Hungarian large-vocabulary continuous speech recognition (LVCSR) task. Among morphologically rich languages, Hungarian is known for its agglutinative, inflective nature that increases the data sparseness caused by a relatively small training database. Although Hungarian spelling is considered as simple phonological, a large part of the corpus is covered by words pronounced in multiple, phonemically different ways. Data-driven and language specific knowledge supported vocabulary decomposition methods are investigated in combination with phoneme- and grapheme-based acoustic modeling techniques on the given task. Word baseline and morph-based advanced baseline results are significantly outperformed by using both statistical and grammatical vocabulary decomposition methods. Although the discussed morph-based techniques recognize a significant amount of out of vocabulary words, the improvements are due not to this fact but to the reduction of insertion errors. Applying grapheme-based acoustic models instead of phoneme-based models causes no severe recognition performance deteriorations. Moreover, a fully data-driven acoustic modeling technique along with a statistical morphological modeling approach provides the best performance on the most difficult test set. The overall best speech recognition performance is obtained by using a novel word to morph decomposition technique that combines grammatical and unsupervised statistical segmentation algorithms. The improvement achieved by the proposed technique is stable across acoustic modeling approaches and larger with speaker adaptation.

[1]  Bhuvana Ramabhadran,et al.  Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments , 2002, TSD.

[2]  Máté Szarvas,et al.  Objective Speech Quality Estimation for Analog Mobile Channels: Problems and Solutions , 2000, Int. J. Speech Technol..

[3]  Tibor Fegyó,et al.  A morpho-graphemic approach for the recognition of spontaneous speech in agglutinative languages - like Hungarian , 2007, INTERSPEECH.

[4]  William J. Byrne,et al.  Building LVCSR System for Transcription of Spontaneously Pronounced Russian Testimonies in the MALACH Project: Initial Steps and First Results , 2003, TSD.

[5]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[6]  Petra Geutner,et al.  Adaptive vocabularies for transcribing multilingual broadcast news , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Sadaoki Furui,et al.  Finite-state transducer based modeling of morphosyntax with applications to Hungarian LVCSR , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Mathias Creutz,et al.  INDUCING THE MORPHOLOGICAL LEXICON OF A NATURAL LANGUAGE FROM UNANNOTATED TEXT , 2005 .

[9]  Ebru Arisoy,et al.  Language modeling for automatic turkish broadcast news transcription , 2007, INTERSPEECH.

[10]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition with morph language models applied to Finnish , 2006, Comput. Speech Lang..

[11]  Ebru Arisoy,et al.  Unlimited vocabulary speech recognition for agglutinative languages , 2006, NAACL.

[12]  Mathias Creutz,et al.  Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0 , 2005 .

[13]  André Berton,et al.  Compound words in large-vocabulary German speech recognition systems , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Péter Mihajlik,et al.  On morph-based LVCSR improvements , 2010, SLTU.

[15]  Ebru Arisoy,et al.  Morph-based speech recognition and modeling of out-of-vocabulary words across languages , 2007, TSLP.

[16]  M. Kurimo,et al.  Decoder issues in unlimited finnish speech recognition , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[17]  Andreas Stolcke,et al.  Morphology-based language modeling for arabic speech recognition , 2004, INTERSPEECH.

[18]  Xiuyang Yu,et al.  What kind of pronunciation variation is hard for triphones to model? , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  Máté Szarvas,et al.  Automatic Recognition of Hungarian: Theory And Practice , 2000, Int. J. Speech Technol..

[20]  Mikko Kurimo,et al.  Vocabulary Decomposition for Estonian Open Vocabulary Speech Recognition , 2007, ACL.

[21]  Péter Halácsy,et al.  Benefits of Resource-Based Stemming in Hungarian Information Retrieval , 2006, CLEF.

[22]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[23]  András Kornai,et al.  Hunmorph: Open Source Word Analysis , 2005, ACL 2005.

[24]  Steve Young,et al.  The HTK book , 1995 .

[25]  Ruhi Sarikaya,et al.  On the use of morphological analysis for dialectal Arabic speech recognition , 2006, INTERSPEECH.

[26]  Martha Larson,et al.  Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches , 2000, INTERSPEECH.

[27]  Tibor Fegyó,et al.  Investigation of morph-based speech recognition improvements across speech genres , 2009, INTERSPEECH.

[28]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[29]  Ebru Arisoy,et al.  Turkish Broadcast News Transcription and Retrieval , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Ebru Arisoy,et al.  Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition , 2006, INTERSPEECH.

[31]  Oh-Wook Kwon,et al.  Korean large vocabulary continuous speech recognition with morpheme-based recognition units , 2003, Speech Commun..

[32]  Franz Kummert,et al.  Grapheme based speech recognition for large vocabularies , 2000, INTERSPEECH.

[33]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.

[34]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[35]  William J. Byrne,et al.  Large vocabulary ASR for spontaneous czech in the MALACH project , 2003, INTERSPEECH.

[36]  Mikko Kurimo,et al.  Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner , 2003, INTERSPEECH.

[37]  Tibor Fegyó,et al.  Towards Automatic Transcription of Large Spoken Archives in Agglutinating Languages - Hungarian ASR for the MALACH Project , 2007, TSD.

[38]  Thomas Pellegrini,et al.  Automatic Word Decompounding for ASR in a Morphologically Rich Language: Application to Amharic , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Hermann Ney,et al.  Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Izhak Shafran,et al.  Corrective Models for Speech Recognition of Inflected Languages , 2006, EMNLP.

[41]  Franciska de Jong,et al.  Compound decomposition in dutch large vocabulary speech recognition , 2003, INTERSPEECH.

[42]  William J. Byrne,et al.  Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project , 2005, INTERSPEECH.

[43]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.