MLLR-based accent model adaptation without accented data

When the user has an accent different from what the automatic speech recognization system is trained with, the performance of the systems degrades. This is attributed to both acoustic and phonological differences between accents. The phonological differences between two accents are due to different phoneme inventories in two languages. Even for the same phoneme, foreigners and native speakers pronounce different sounds. Since accented data is rare but monolingual data is abundant, we propose using the accented speaker’s first language data directly instead of accented data in the second language for our purpose. We propose adapting the native English phoneme models to accented phoneme models using first language data in MLLR adaptation. The baseline performance is 35.25% (phone accuracy) in using native English phone models to recognize Cantoneseaccented English speech data. We compare accent adaptation by using accented data and source language data. On the average, using accented data for adaptation improves the phone accuracy by 69.98% while using source language data for adaptation improves the phone accuracy by 70.13%. This shows that both kinds of adaptation data give similar improvements. Therefore non-accented data can be used for adaptation. We can rapidly obtain an accent-adapted acoustic model without the need of collecting accented database.

[1]  John H. L. Hansen,et al.  Foreign accent classification using source generator based prosodic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Tan Lee,et al.  Acoustic modeling and language modeling for cantonese LVCSR , 1999, EUROSPEECH.

[3]  J. Foote,et al.  WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 1995 .

[4]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[5]  Sandra P. Whiteside,et al.  Analysis of ten vowel sounds across gender and regional/cultural accent , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Pascale Fung,et al.  Fast accent identification and accented speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..