论文信息 - Acoustic data augmentation for Mandarin-English code-switching speech recognition

Acoustic data augmentation for Mandarin-English code-switching speech recognition

Abstract Code-switching (CS) is a multilingual phenomenon where a speaker uses different languages in an utterance or between alternating utterances. Developing large-scale datasets for training code-switching acoustic and language models is challenging and extremely expensive. In this paper, we focus on the acoustic data augmentation for the Mandarin-English CS speech recognition task. Effectiveness of conventional acoustic data augmentation approaches are examined. More importantly, we propose a CS acoustic event detection system based on the deep neural network to extract real code-switching speech segments automatically. Then, the semi-supervised and active learning techniques are investigated to generate transcriptions of these segments. Finally, code-switching speech synthesis system is introduced to further enhance the acoustic modeling. Experimental results on the OC16-CE80 data, a Mandarin-English mixlingual speech corpus, demonstrate the effectiveness of the proposed methods.

[1] David A. van Leeuwen,et al. Exploiting Untranscribed Broadcast Data for Improved Code-Switching Detection , 2017, INTERSPEECH.

[2] Yonghong Yan,et al. An Exploration of Dropout with LSTMs , 2017, INTERSPEECH.

[3] Haizhou Li,et al. A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Lin-Shan Lee,et al. An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[5] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[6] Kai Yu,et al. A comparative study of robustness of deep learning approaches for VAD , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Chung-Hsien Wu,et al. Code-Switching Event Detection by Using a Latent Language Space Model and the Delta-Bayesian Information Criterion , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8] David A. van Leeuwen,et al. Semi-supervised acoustic model training for speech with code-switching , 2018, Speech Commun..

[9] Chng Eng Siong,et al. Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition , 2018, INTERSPEECH.

[10] Dau-Cheng Lyu,et al. Speech Recognition on Code-Switching Among the Chinese Dialects , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11] Dong Wang,et al. OC16-CE80: A Chinese-English mixlingual database and a speech recognition baseline , 2016, 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA).

[12] Satoshi Nakamura,et al. Transcribing against time , 2017, Speech Commun..

[13] C. Baker. Foundations of Bilingual Education and Bilingualism , 1993 .

[14] Ngoc Thang Vu,et al. Combining recurrent neural networks and factored language models during decoding of code-Switching speech , 2014, INTERSPEECH.

[15] Xiao Song,et al. A Multi-task Learning Approach for Mandarin-English Code-Switching Conversational Speech Recognition , 2017, ISICA.

[16] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[17] Yanhua Long,et al. Active Learning for LF-MMI Trained Neural Networks in ASR , 2018, INTERSPEECH.

[18] Bo Xu,et al. Chinese-English bilingual phone modeling for cross-language speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] Yanhua Long,et al. Large-Scale Semi-Supervised Training in Deep Learning Acoustic Model for ASR , 2019, IEEE Access.

[20] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Alan W. Black,et al. Automatic Detection of Code-switching Style from Acoustics , 2018, CodeSwitch@ACL.

[22] Tan Lee,et al. Development of a Cantonese-English code-mixing speech corpus , 2005, INTERSPEECH.

[23] Haizhou Li,et al. SEAME: a Mandarin-English code-switching speech corpus in south-east asia , 2010, INTERSPEECH.

[24] Riyaz Ahmad Bhat,et al. Language Identification in Code-Switching Scenario , 2014, CodeSwitch@EMNLP.

[25] Thomas Niesler,et al. Building a Unified Code-Switching ASR System for South African Languages , 2018, INTERSPEECH.

[26] P. Auer,et al. Code-switching in conversation: Language, interaction and identity , 2000 .

[27] Sanjeev Khudanpur,et al. A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).