Using closely-related language to build an ASR for a very under-resourced language: Iban

This paper describes our work on automatic speech recognition system (ASR) for an under-resourced language, namely the Iban language, which is spoken in Sarawak, a Malaysian Borneo state. To begin this study, we collected 8 hours of speech data due to no resources yet for ASR concerning this language. Following the lack of resources, we employed bootstrapping techniques on a closely-related language to build the Iban system. For this case, we utilized Malay data to bootstrap the grapheme-to-phoneme system (G2P) for the target language. We also developed several G2Ps to acquire Iban pronunciation dictionaries, which were later evaluated on the Iban ASR for obtaining the best version. Subsequently, we conducted experiments on cross-lingual ASR by using subspace Gaussian Mixture Models (SGMM) where the shared parameters obtained in either monolingual or multilingual fashion. From our observations, using out-of-language data as source language provided lower WER when Iban data is very imited.

[1]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[2]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[3]  Florian Metze,et al.  Subspace mixture model for low-resource speech recognition in cross-lingual settings , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Haizhou Li,et al.  MASS: A Malay language LVCSR corpus resource , 2009, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments.

[5]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[6]  Kai Feng,et al.  Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Bali Ranaivo-Malançon,et al.  Malay Grapheme to Phoneme Tool for Automatic Speech Recognition , 2009 .

[8]  Hynek Hermansky,et al.  Cross-lingual and multi-stream posterior features for low resource LVCSR systems , 2010, INTERSPEECH.

[9]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[10]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[11]  Laurent Besacier,et al.  Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language , 2013 .

[12]  Tanja Schultz,et al.  Globalphone: a multilingual speech and text database developed at karlsruhe university , 2002, INTERSPEECH.

[13]  Minematsu Nobuaki,et al.  Evaluations of an Open Source WFST-based Phoneticizer , 2011 .

[14]  Liang Lu,et al.  Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[16]  Tanja Schultz,et al.  Fast bootstrapping of LVCSR systems with multilingual phoneme sets , 1997, EUROSPEECH.

[17]  Tanja Schultz,et al.  Multilingual and Crosslingual Speech Recognition , 1998 .

[18]  P. Lewis Ethnologue : languages of the world , 2009 .

[19]  K. Adelaar,et al.  The Austronesian languages of Asia and Madagascar: a historical perspective , 2005 .

[20]  Mark Liberman,et al.  Transcriber: a free tool for segmenting, labeling and transcribing speech , 1998, LREC.

[21]  Marelie H. Davel,et al.  Pronunciation dictionary development in resource-scarce environments , 2009, INTERSPEECH.

[22]  Petr Motlícek,et al.  Using out-of-language data to improve an under-resourced speech recognizer , 2014, Speech Communication.

[23]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[24]  Paul Deléglise,et al.  TED-LIUM: an Automatic Speech Recognition dedicated corpus , 2012, LREC.

[25]  Solange Rossato,et al.  Semi-supervised G2p bootstrapping and its application to ASR for a very under-resourced language: Iban , 2014, SLTU.

[26]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).