Improving Code-Switching Language Modeling with Artificially Generated Texts Using Cycle-Consistent Adversarial Networks

This paper presents our latest effort on improving Codeswitching language models that suffer from data scarcity. We investigate methods to augment Code-switching training text data by artificially generating them. Concretely, we propose a cycle-consistent adversarial networks based framework to transfer monolingual text into Code-switching text, considering Code-switching as a speaking style. Our experimental results on the SEAME corpus show that utilizing artificially generated Code-switching text data improves consistently the language model as well as the automatic speech recognition performance.

[1]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[2]  Shun-Po Chuang,et al.  Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation , 2018, INTERSPEECH.

[3]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[4]  Tao Chen,et al.  Creating a live, public short message service corpus: the NUS SMS corpus , 2011, Lang. Resour. Evaluation.

[5]  Pascale Fung,et al.  Learn to Code-Switch: Data Augmentation using Copy Mechanism on Language Modeling , 2018, ArXiv.

[6]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Ngoc Thang Vu,et al.  CycleGAN-Based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition , 2019, INTERSPEECH.

[9]  Hirokazu Kameoka,et al.  Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks , 2017, ArXiv.

[10]  Chia-Yu Li,et al.  Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching , 2019, 2019 International Conference on Asian Language Processing (IALP).

[11]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[12]  Junlan Feng,et al.  Code-Switching Sentence Generation by Bert and Generative Adversarial Networks , 2019, INTERSPEECH.

[13]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[14]  Sanjeev Khudanpur,et al.  Audio augmentation for speech recognition , 2015, INTERSPEECH.

[15]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[16]  Hao Zheng,et al.  AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  David A. van Leeuwen,et al.  Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech , 2018, INTERSPEECH.

[19]  A. Backus Code-switching in conversation: Language, interaction and identity , 2000 .

[20]  Shinji Watanabe,et al.  ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.

[21]  Xu Sun,et al.  An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation , 2018, EMNLP.

[22]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[24]  Haizhou Li,et al.  Linguistically Motivated Parallel Data Augmentation for Code-Switch Language Modeling , 2019, INTERSPEECH.

[25]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[26]  Jean-Luc Gauvain,et al.  Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions , 2016, INTERSPEECH.

[27]  Ngoc Thang Vu,et al.  Exploration of the Impact of Maximum Entropy in Recurrent Neural Network Language Models for Code-Switching Speech , 2014, CodeSwitch@EMNLP.

[28]  Björn Gambäck On Measuring the Complexity of Code-Mixing , 2014 .