Improved mixed language speech recognition using asymmetric acoustic model and language model with code-switch inversion constraints

We propose an integrated framework for large vocabulary continuous mixed language speech recognition that handles the accent effect in the bilingual acoustic model and the inversion constraint well known to linguists in the language model. Our asymmetric acoustic model with phone set extension improves upon previous work by striking a balance between data and phonetic knowledge. Our language model improves upon previous work by (1) using the inversion constraint to predict code switching points in the mixed language and (2) integrating a code-switch prediction model, a translation model and a reconstruction model together. This integration means that our language model avoids the pitfall of propagated error that could arise from decoupling these steps. Finally, a WFST-based decoder integrates the acoustic models, code-switch language model and a monolingual language model in the matrix language all together. Our system reduces word error rate by 1.88% on a lecture speech corpus and by 2.43% on a lunch conversation corpus, with statistical significance, over the conventional bilingual acoustic model and interpolated language model.

[1]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[2]  Sadaoki Furui,et al.  Recent Development of WFST-Based Speech Recognition Decoder , 2009 .

[3]  WuDekai Stochastic inversion transduction grammars and bilingual parsing of parallel corpora , 1997 .

[4]  Hervé Bourlard,et al.  Language dependent universal phoneme posterior estimation for mixed language speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  F. Coulmas,et al.  社会语言学通览 = The Handbook of sociolinguistics , 2001 .

[6]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Pascale Fung,et al.  Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora , 2005, IJCNLP.

[8]  David Sankoff,et al.  A formal grammar for code‐switching 1 , 1981 .

[9]  Yonghong Yan,et al.  Mandarin-English bilingual Speech Recognition for real world music retrieval , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Ying Li,et al.  Asymmetric acoustic modeling of mixed language speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Ying Li,et al.  A Mandarin-English Code-Switching Corpus , 2012, LREC.

[12]  Ying Li,et al.  Code-Switch Language Model with Inversion Constraints for Mixed Language Speech Recognition , 2012, COLING.

[13]  Irina Illina,et al.  Combined acoustic and pronunciation modelling for non-native speech recognition , 2007, INTERSPEECH.

[14]  William C. Ritchie,et al.  The handbook of bilingualism and multilingualism , 2012 .

[15]  Jeff MacSwan,et al.  Code Switching and Grammatical Theory , 2008 .