Asymmetric acoustic modeling of mixed language speech

We propose to improve speech recognition performance on speaker-independent, mixed language speech by asymmetric acoustic modeling. Mixed language is either inter-sentential code switching from the source matrix language to a foreign language or intra-sentential code mixing between the matrix language and embedded foreign words or phrases. In either case, the foreign phrases are pronounced by the matrix language speaker with varying degrees of accent. Our proposed system using selective decision tree merging between a bilingual model and an accented embedded speech model outperforms previous approaches of either using a bilingual model with model retraining by 21.51%, or using adaptation by 15.88%. It outperforms all models on both code mixing and code switching cases. We successfully improved recognition on embedded foreign speech without degrading the performance on the matrix language speech.

[1]  Tanja Schultz,et al.  Multilingual Speech Processing , 2006 .

[2]  Tan Lee,et al.  Effects of language mixing for automatic recognition of Cantonese-English code-mixing utterances , 2009, INTERSPEECH.

[3]  Thomas Niesler,et al.  Accent identification in the presence of code-mixing , 2008, Odyssey.

[4]  Tanja Schultz,et al.  Multilingual Speech Processing in the context of Under-resourced Languages , 2008 .

[5]  Yi Liu,et al.  Effects and modeling of phonetic and acoustic confusions in accented speech. , 2005, The Journal of the Acoustical Society of America.

[6]  Yonghong Yan,et al.  Mandarin-English bilingual Speech Recognition for real world music retrieval , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Pascale Fung,et al.  Using English Phoneme Models for Chinese Speech Recognition , 1998 .

[8]  Dau-Cheng Lyu,et al.  Speech Recognition on Code-Switching Among the Chinese Dialects , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Dau-Cheng Lyu,et al.  Language identification on code-switching utterances using multiple cues , 2008, INTERSPEECH.

[10]  Pascale Fung,et al.  Multi-accent Chinese speech recognition , 2006, INTERSPEECH.

[11]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[12]  Tan Lee,et al.  Detection of language boundary in code-switching utterances by bi-phone probabilities , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[13]  Tan Lee,et al.  Automatic speech recognition of Cantonese-English code-mixing utterances , 2006, INTERSPEECH.