Recent advances in automatic speech recognition for vietnamese

This paper presents our recent activities for automatic speech recognition for Vietnamese. First, our text data collection and processing methods and tools are described. For language modeling, we investigate word, sub-word and also hybrid word/sub-word models. For acoustic modeling, when only limited speech data are available for Vietnamese, we propose some crosslingual acoustic modeling techniques. Furthermore, since the use of sub-word units can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. Experimental results evaluated on the VnSpeechCorpus demonstrate the feasibility of our methods.

[1]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Jean-François Bonastre,et al.  Automatic transcription of Somali language , 2006, INTERSPEECH.

[3]  Laurent Besacier,et al.  Using the web for fast language model construction in minority languages , 2003, INTERSPEECH.

[4]  Ruhi Sarikaya,et al.  On the use of morphological analysis for dialectal Arabic speech recognition , 2006, INTERSPEECH.

[5]  Hervé Blanchon,et al.  The LIG Arabic/English speech translation system at IWSLT08 , 2007, IWSLT.

[6]  Jean-François Serignat,et al.  Spoken and Written Language Resources for Vietnamese , 2004, LREC.

[7]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[8]  Ebru Arisoy,et al.  Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition , 2006, INTERSPEECH.

[9]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[10]  Andrej Zgank,et al.  Agglomerative vs. tree-based clustering for the definition of multilingual set of triphones , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  Richard M. Stern,et al.  LATTICE COMBINATION FOR IMPROVED SPEECH RECOGNITON , 2001 .

[12]  Tanja Schultz,et al.  Acoustic-Phonetic Unit Similarities For Context Dependent Acoustic Model Portability , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Laurent Besacier,et al.  First steps in fast acoustic modeling for a new target language: application to Vietnamese , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[15]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..