OC16-CE80: A Chinese-English mixlingual database and a speech recognition baseline

We present the OC16-CE80 Chinese-English mixlingual speech database which was released as a main resource for training, development and test for the Chinese-English mixhngual speech recognition (MixASR-CHEN) challenge on O-COCOSDA 2016. This database consists of 80 hours of speech signals recorded from more than 1,400 speakers, where the utterances are in Chinese but each involves one or several Enghsh words. Based on the database and another two free data resources (THCHS30 and the CMU dictionary), a speech recognition (ASR) baseline was constructed with the deep neural network-hidden Markov model (DNN-HMM) hybrid system. We then report the baseline results following the MixASR-CHEN evaluation rules and demonstrate that OC16-CE80 is a reasonable data resource for mixlingual research.

[1]  Svenja Kranich,et al.  Language Contact , 2020, The Dutch Language in Japan (1600-1900).

[2]  Dong Wang,et al.  THCHS-30 : A Free Chinese Speech Corpus , 2015, ArXiv.

[3]  Ngoc Thang Vu,et al.  Features for factored language models for code-Switching speech , 2014, SLTU.

[4]  Marelie H. Davel,et al.  Implications of Sepedi/English code switching for ASR systems , 2013 .

[5]  Shana Poplack,et al.  Code Switching: Linguistic , 2001 .

[6]  Sanjeev Khudanpur,et al.  Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .

[7]  Dau-Cheng Lyu,et al.  Language identification on code-switching utterances using multiple cues , 2008, INTERSPEECH.

[8]  C. Baker Foundations of Bilingual Education and Bilingualism , 1993 .

[9]  Elmar Nöth,et al.  Acoustic modeling of foreign words in a German speech recognition system , 2001, INTERSPEECH.

[10]  Barbara E. Bullock,et al.  The Cambridge Handbook of Linguistic Code-switching: Conceptual and methodological considerations in code-switching research , 2009 .

[11]  Ying Li,et al.  Code-Switch Language Model with Inversion Constraints for Mixed Language Speech Recognition , 2012, COLING.

[12]  James R. Burke,et al.  Inhibition of α-ketoglutarate-and pyruvate dehydrogenase complexes in E. coli by a glutathione S-transferase containing a pathological length poly-Q domain: A possible role of energy deficit in neurological diseases associated with poly-Q expansions? , 1998, AGE.

[13]  Chng Eng Siong,et al.  Mandarin–English code-switching speech corpus in South-East Asia: SEAME , 2015, Lang. Resour. Evaluation.

[14]  E. Chng,et al.  An Analysis of a Mandarin-English Code-switching Speech Corpus : SEAME , 2010 .

[15]  S. May,et al.  Bilingual education and bilingualism , 2003, Language Teaching.

[16]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  David A. van Leeuwen,et al.  Investigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech , 2016, SLTU.

[18]  J. Altarriba,et al.  Bilingual Language Mixing: Why Do Bilinguals Code-Switch? , 2001 .

[19]  Tetyana Lyudovyk,et al.  Code-Switching speech recognition for closely related languages , 2014, SLTU.

[20]  R. Hickey The Handbook of Language Contact , 2010 .

[21]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[22]  Barbara E. Bullock,et al.  The Cambridge Handbook of Linguistic Code-switching: Acknowledgements , 2009 .

[23]  Yaron Matras,et al.  Contact languages.: A comprehensive guide. , 2013 .

[24]  Chung-Hsien Wu,et al.  Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  A. Backus Code-switching in conversation: Language, interaction and identity , 2000 .

[26]  Xiaohui Zhang,et al.  Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Haizhou Li,et al.  Recurrent neural network language modeling for code switching conversational speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Tan Lee,et al.  Development of a Cantonese-English code-mixing speech corpus , 2005, INTERSPEECH.

[29]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[30]  Xiaohui Zhang,et al.  Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.

[31]  Dau-Cheng Lyu,et al.  Speech Recognition on Code-Switching Among the Chinese Dialects , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.