Acoustic model merging using acoustic models from multilingual speakers for automatic speech recognition

Many studies have explored on the usage of existing multilingual speech corpora to build an acoustic model for a target language. These works on multilingual acoustic modeling often use multilingual acoustic models to create an initial model. This initial model created is often suboptimal in decoding speech of the target language. Some speech of the target language is then used to adapt and improve the initial model. In this paper however, we investigate multilingual acoustic modeling in enhancing an acoustic model of the target language for automatic speech recognition system. The proposed approach employs context dependent acoustic model merging of a source language to adapt acoustic model of a target language. The source and target language speech are spoken by speakers from the same country. Our experiments on Malay and English automatic speech recognition shows relative improvement in WER from 2% to about 10% when multilingual acoustic model was employed.

[1]  Steve J. Young,et al.  Off-line acoustic modelling of non-native accents , 1999, EUROSPEECH.

[2]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[3]  John J. Morgan,et al.  Making a Speech Recognizer Tolerate Non-native Speech through Gaussian Mixture Merging , 2004 .

[4]  Jean Paul Haton,et al.  Fully Automated Non-Native Speech Recognition Using Confusion-Based Acoustic Model Integration and Graphemic Constraints , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Petr Motlícek,et al.  Using out-of-language data to improve an under-resourced speech recognizer , 2014, Speech Communication.

[6]  Hui Lin,et al.  A study on multilingual acoustic modeling for large vocabulary ASR , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Tan Tien Ping Automatic Speech Recognition for Non- Native Speakers , 2008 .

[8]  Haizhou Li,et al.  MASS: A Malay language LVCSR corpus resource , 2009, 2009 Oriental COCOSDA International Conference on Speech Database and Assessments.

[9]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[10]  Laurent Besacier,et al.  First steps in fast acoustic modeling for a new target language: application to Vietnamese , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..