Multilingual acoustic models for the recognition of non-native speech

We report on the use of multilingual hidden Markov models for the recognition of non-native speech. Based on the design of a common phoneme set that provides a phone compression rate of almost 80 percent compared to a conglomerate of language dependent phone sets, we create acoustic models that share training data from up to 5 languages. Results obtained on two different data bases of non-native English demonstrate the feasibility of the approach, showing improved recognition accuracy in case of sparse training material, and also for speakers whose native language is not in the training data.

[1]  Chalapathy Neti,et al.  Towards speech understanding across multiple languages , 1998, ICSLP.

[2]  Manuela Boros,et al.  Recognition of non-native German speech with multilingual recognizers , 1999, EUROSPEECH.

[3]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Patrizia Bonaventura,et al.  Multilingual speech recognition for flexible vocabularies , 1997, EUROSPEECH.

[5]  Yuqing Gao,et al.  Speaker-independent upfront dialect adaptation in a large vocabulary continuous speech recognizer , 1998, ICSLP.

[6]  Alex Waibel,et al.  Language Portability in Acoustic Modeling , 2000 .

[7]  Michael Picheny,et al.  Robust methods for using context-dependent features and models in a continuous speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..