论文信息 - Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban

Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban

This paper describes our experiments and results on using a local dominant language in Malaysia (Malay), to bootstrap automatic speech recognition (ASR) for a very under-resourced language: Iban (also spoken in Malaysia on the Borneo Island part). Resources in Iban for building a speech recognition were nonexistent. For this, we tried to take advantage of a language from the same family with several similarities. First, to deal with the pronunciation dictionary, we proposed a bootstrapping strategy to develop an Iban pronunciation lexicon from a Malay one. A hybrid version, mix of Malay and Iban pronunciations, was also built and evaluated. Following this, we experimented with three Iban ASRs; each depended on either one of the three different pronunciation dictionaries: Malay, Iban or hybrid.

Solange Rossato | Laurent Besacier | Sarah Flora Samson Juan | L. Besacier | Solange Rossato

[1] Minematsu Nobuaki,et al. Evaluations of an Open Source WFST-based Phoneticizer , 2011 .

[2] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[3] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[4] W. Heeringa,et al. The origin of the Afrikaans pronunciation: a comparison to West Germanic languages and Dutch dialects , 2008 .

[5] Tanja Schultz,et al. Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[6] Grzegorz Kondrak,et al. Online discriminative training for grapheme-to-phoneme conversion , 2009, INTERSPEECH.

[7] Vaibhava Goel,et al. Segmental minimum Bayes-risk decoding for automatic speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[8] Ramesh A. Gopinath,et al. Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9] Steven Bird,et al. Phonology , 2002, ArXiv.

[10] K. Adelaar,et al. The Austronesian languages of Asia and Madagascar: a historical perspective , 2005 .

[11] Mark Liberman,et al. Transcriber: a free tool for segmenting, labeling and transcribing speech , 1998, LREC.

[12] Mark Liberman,et al. Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[13] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14] Haizhou Li,et al. MASS: A Malay language LVCSR corpus resource , 2009, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments.

[15] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[16] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[17] Laurent Besacier,et al. Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language , 2013 .

[18] Marelie H. Davel,et al. Pronunciation dictionary development in resource-scarce environments , 2009, INTERSPEECH.