Multi-Graph Decoding for Code-Switching ASR

In the FAME! Project, a code-switching (CS) automatic speech recognition (ASR) system for Frisian-Dutch speech is developed that can accurately transcribe the local broadcaster's bilingual archives with CS speech. This archive contains recordings with monolingual Frisian and Dutch speech segments as well as Frisian-Dutch CS speech, hence the recognition performance on monolingual segments is also vital for accurate transcriptions. In this work, we propose a multi-graph decoding and rescoring strategy using bilingual and monolingual graphs together with a unified acoustic model for CS ASR. The proposed decoding scheme gives the freedom to design and employ alternative search spaces for each (monolingual or bilingual) recognition task and enables the effective use of monolingual resources of the high-resourced mixed language in low-resourced CS scenarios. In our scenario, Dutch is the high-resourced and Frisian is the low-resourced language. We therefore use additional monolingual Dutch text resources to improve the Dutch language model (LM) and compare the performance of single- and multi-graph CS ASR systems on Dutch segments using larger Dutch LMs. The ASR results show that the proposed approach outperforms baseline single-graph CS ASR systems, providing better performance on the monolingual Dutch segments without any accuracy loss on monolingual Frisian and code-mixed segments.

[1]  Haizhou Li,et al.  Integration of language identification into a recognition system for spoken conversations containing code-Switches , 2012, SLTU.

[2]  Thomas Niesler,et al.  Synthesising isiZulu-English Code-Switch Bigrams Using Word Embeddings , 2017, INTERSPEECH.

[3]  Melissa G. Moyer Pieter Muysken, Bilingual speech: A typology of code-mixing. Cambridge: Cambridge University Press, 2000. Pp. xvi, 306. Hb $ 59.95. , 2002, Language in Society.

[4]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[6]  David A. van Leeuwen,et al.  Code-Switching Detection with Data-Augmented Acoustic and Language Models , 2018, SLTU.

[7]  David A. van Leeuwen,et al.  Investigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech , 2016, SLTU.

[8]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[9]  Sarah G. Thomason,et al.  Language Contact: An Introduction , 2001 .

[10]  Chung-Hsien Wu,et al.  Code-Switching Event Detection by Using a Latent Language Space Model and the Delta-Bayesian Information Criterion , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  David A. van Leeuwen,et al.  Code-switching detection using multilingual DNNS , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[12]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[13]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[15]  Ngoc Thang Vu,et al.  Generating exact lattices in the WFST framework , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  E. Haugen,et al.  Languages in Contact: Findings and Problems , 1954 .

[17]  David A. van Leeuwen,et al.  Semi-supervised acoustic model training for speech with code-switching , 2018, Speech Commun..

[18]  Alex Park,et al.  FST-based recognition techniques for multi-lingual and multi-domain spontaneous speech , 2001, INTERSPEECH.

[19]  C. Myers-Scotton Codeswitching with English: types of switching, types of communities , 1989 .

[20]  Pieter Muysken,et al.  Bilingual Speech: A Typology of Code-Mixing , 2000 .

[21]  Haizhou Li,et al.  Recurrent neural network language modeling for code switching conversational speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  A. Backus Code-switching in conversation: Language, interaction and identity , 2000 .

[23]  I-Fan Chen,et al.  A new framework for system combination based on integrated hypothesis space , 2006, INTERSPEECH.

[24]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[25]  Pascale Fung,et al.  Code Switching Language Model with Translation Constraint for Mixed Language Speech Recognition , 2012, International Conference on Computational Linguistics.

[26]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[27]  David A. van Leeuwen,et al.  Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech , 2018, INTERSPEECH.

[28]  Tien-Ping Tan,et al.  Evaluating Code-Switched Malay-English Speech Using Time Delay Neural Networks. , 2018, SLTU-2018.

[29]  Georges Linarès,et al.  Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Yiming Wang,et al.  Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs , 2018, IEEE Signal Processing Letters.

[31]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[32]  Mark J. F. Gales,et al.  Recurrent neural network language model training with noise contrastive estimation for speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Uriel Weinreich,et al.  Languages in Contact: French, German and Romansh in twentieth-century Switzerland , 2011 .

[34]  H. Soltau,et al.  Efficient handling of multilingual language models , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[35]  Hideki Kashioka,et al.  Tied-State Mixture Language Model for WFST-based Speech Recognition , 2012, INTERSPEECH.

[36]  Yiming Wang,et al.  Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[37]  David A. van Leeuwen,et al.  A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research , 2016, LREC.

[38]  David A. van Leeuwen,et al.  Language diarization for semi-supervised bilingual acoustic model training , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[39]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[40]  Dau-Cheng Lyu,et al.  Speech Recognition on Code-Switching Among the Chinese Dialects , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.