First steps in fast acoustic modeling for a new target language: application to Vietnamese

This paper presents our first steps in fast acoustic modeling for a new target language. Both knowledge-based and data-driven methods were used to obtain phone mapping tables between a source language (French) and a target language (Vietnamese). While acoustic models borrowed directly from the source language did not perform very well, we have shown that using a small amount of adaptation data in the target language (one or two hours) lead to very acceptable automatic speech recognition (ASR) performance. Our best continuous Vietnamese recognition system, adapted with only two hours of Vietnamese data, obtains a word accuracy of 63.9% on one hour of Vietnamese speech dialog for instance.

[1]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[2]  Terrence Martin,et al.  Cross-lingual pronunciation modelling for indonesian speech recognition , 2003, INTERSPEECH.

[3]  Vincent Berment Several Technical Issues for Building New Lexical Bases , 2002 .

[4]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Klaus Ries,et al.  The Karlsruhe-Verbmobil speech recognition engine , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Laurent Besacier,et al.  Using the web for fast language model construction in minority languages , 2003, INTERSPEECH.

[7]  Chafic Mokbel,et al.  Towards multilingual speech recognition using data driven source/target acoustical units association , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Dominique Vaufreydaz,et al.  A New Methodology for Speech Corpora Definition from Internet Documents , 2000, LREC.

[9]  Jean-François Serignat,et al.  Spoken and Written Language Resources for Vietnamese , 2004, LREC.