Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech

Code-switching (CS), defined as the mixing of languages in conversations, has become a worldwide phenomenon. The prevalence of CS has been recently met with a growing demand and interest to build CS ASR systems. In this paper, we present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR). We first contribute in filling the huge gap in resources by collecting, analyzing and publishing our spontaneous CS Egyptian Arabic-English speech corpus. We build our ASR systems using DNN-based hybrid and Transformer-based end-to-end models. In this paper, we present a thorough comparison between both approaches under the setting of a low-resource, orthographically unstandardized, and morphologically rich language pair. We show that while both systems give comparable overall recognition results, each system provides complementary sets of strength points. We show that recognition can be improved by combining the outputs of both systems. We propose several effective system combination approaches, where hypotheses of both systems are merged on sentenceand word-levels. Our approaches result in overall WER relative improvement of 4.7%, over a baseline performance of 32.1% WER. In the case of intra-sentential CS sentences, we achieve WER relative improvement of 4.8%. Our best performing system achieves 30.6% WER on ArzEn test set.

[1]  Xiangang Li,et al.  Towards End-to-End Code-Switching Speech Recognition , 2018, ArXiv.

[2]  Hermann Ney,et al.  iROVER: Improving System Combination with Classification , 2007, NAACL.

[3]  Zahra Mustafa,et al.  Code‐mixing of Arabic and English in teaching science , 1994 .

[4]  Amitava Das,et al.  Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text , 2014, ICON.

[5]  Lori Lamel,et al.  Addressing Code-Switching in French/Algerian Arabic Speech , 2017, INTERSPEECH.

[6]  Xiaodong Cui,et al.  A comparative study on system combination schemes for LVCSR , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Sanjeev Khudanpur,et al.  Audio augmentation for speech recognition , 2015, INTERSPEECH.

[8]  Fang Deng,et al.  End-to-End Code-Switching ASR for Low-Resourced Language Pairs , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[9]  Hermann Ney,et al.  Frame based system combination and a comparison with weighted ROVER and CNC , 2006, INTERSPEECH.

[10]  Pascale Fung,et al.  A Hindi-English Code-Switching Corpus , 2014, LREC.

[11]  David Suendermann-Oeft,et al.  Jee haan, I'd like both, por favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine Dialog , 2017, INTERSPEECH.

[12]  Manal A. Ismail,et al.  The Sociolinguistic Dimensions of Code-Switching between Arabic and English by Saudis , 2015 .

[13]  S. Gosling,et al.  A very brief measure of the Big-Five personality domains , 2003 .

[14]  Monojit Choudhury,et al.  Phone Merging For Code-Switched Speech Recognition , 2018, CodeSwitch@ACL.

[15]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[16]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[17]  Dau-Cheng Lyu,et al.  Speech Recognition on Code-Switching Among the Chinese Dialects , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[18]  James R. Glass,et al.  Automatic speech recognition of Arabic multi-genre broadcast media , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[19]  Rainer Gruhn,et al.  Novel Techniques for Dialectal Arabic Speech Recognition , 2012 .

[20]  Ananth Sankar Bayesian model combination (BAYCOM) for improved recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Thomas Niesler,et al.  Semi-supervised acoustic model training for five-lingual code-switched ASR , 2019, INTERSPEECH.

[23]  David A. van Leeuwen,et al.  A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research , 2016, LREC.

[24]  Tara N. Sainath,et al.  Multilingual Speech Recognition with a Single End-to-End Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Ngoc Thang Vu,et al.  ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English , 2020, LREC.

[26]  Ahmed Mohamed Abdel Maksoud Ali,et al.  Multi-dialect Arabic broadcast speech recognition , 2018 .

[27]  Chng Eng Siong,et al.  Mandarin–English code-switching speech corpus in South-East Asia: SEAME , 2015, Lang. Resour. Evaluation.

[28]  Slim Abdennadher,et al.  Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus , 2018, LREC.

[29]  James Glass,et al.  The MGB-5 Challenge: Recognition and Dialect Identification of Dialectal Arabic Speech , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[30]  Chin-Hui Lee,et al.  Towards knowledge-based features for HMM based large vocabulary automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Brian Kingsbury,et al.  The IBM 2008 GALE Arabic speech transcription system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Tien Ping Tan,et al.  Automatic Speech Recognition of Code Switching Speech Using 1-Best Rescoring , 2012, 2012 International Conference on Asian Language Processing.

[33]  Uri Tadmor,et al.  Loanwords in the World's Languages: A Comparative Handbook , 2009 .

[34]  Özlem Çetinoglu,et al.  A Code-Switching Corpus of Turkish-German Conversations , 2017, LAW@ACL.

[35]  Ngoc Thang Vu,et al.  Challenges of Computational Processing of Code-Switching , 2016, CodeSwitch@EMNLP.

[36]  Yang Liu,et al.  Learning to Predict Code-Switching Points , 2008, EMNLP.

[37]  Chia-Yu Li,et al.  Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching , 2019, 2019 International Conference on Asian Language Processing (IALP).

[38]  Andreas Stolcke,et al.  THE SRI MARCH 2000 HUB-5 CONVERSATIONAL SPEECH TRANSCRIPTION SYSTEM , 2000 .

[39]  Mohammed Ilyas,et al.  The Sociolinguistic Significance of the Attitudes towards Code-Switching in Saudi Arabia Academia , 2018 .

[40]  Dessi Puji Lestari,et al.  Text Corpus and Acoustic Model Addition for Indonesian-Arabic Code-switching in Automatic Speech Recognition System , 2019, 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA).

[41]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[42]  Steve Renals,et al.  Word Error Rate Estimation for Speech Recognition: e-WER , 2018, ACL.

[43]  Yifan Gong,et al.  Towards Code-switching ASR for End-to-end CTC Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Shinji Watanabe,et al.  ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.

[45]  Stephan Vogel,et al.  Speech recognition challenge in the wild: Arabic MGB-3 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[46]  John R. Hershey,et al.  Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.

[47]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Sunayana Sitaram,et al.  Homophone Identification and Merging for Code-switched Speech Recognition , 2018, INTERSPEECH.

[49]  Rebecca B. Rubin,et al.  Self-Assessment Manikin , 2010 .

[50]  Marelie H. Davel,et al.  Implications of Sepedi/English code switching for ASR systems , 2013 .

[51]  James R. Glass,et al.  A complete KALDI recipe for building Arabic speech recognition systems , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[52]  M.A. Al-Alaoui,et al.  Arabic speech recognition using recurrent neural networks , 2003, Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No.03EX795).

[53]  Haihua Xu,et al.  An improved consensus-like method for Minimum Bayes Risk decoding and lattice combination , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Chung-Hsien Wu,et al.  CECOS: A Chinese-English code-switching speech database , 2011, 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA).

[55]  Dau-Cheng Lyu,et al.  Language identification on code-switching utterances using multiple cues , 2008, INTERSPEECH.

[56]  A. Bentahila,et al.  Motivations for code-switching among Arabic-French bilinguals in Morocco , 1983 .

[57]  Haizhou Li,et al.  On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition , 2018, INTERSPEECH.

[58]  Students' and teachers' attitudes towards Kuwaiti English code-switching , 2007 .

[59]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[60]  Thomas Niesler,et al.  Building a Unified Code-Switching ASR System for South African Languages , 2018, INTERSPEECH.

[61]  L. R. Goldberg THE DEVELOPMENT OF MARKERS FOR THE BIG-FIVE FACTOR STRUCTURE , 1992 .

[62]  Lamia Bach Baoueb Social factors for code-switching in Tunisian business companies: A case study , 2009 .

[63]  Andreas Stolcke,et al.  Morphology-based language modeling for conversational Arabic speech recognition , 2006, Comput. Speech Lang..

[64]  R. Bayeh,et al.  Broadcast News Transcription Baseline System using the NEMLAR database , 2006 .

[65]  Hagen Soltau,et al.  From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[66]  Ngoc Thang Vu,et al.  Code-Switching Language Modeling with Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English , 2019, SPECOM.

[67]  Dong Yu,et al.  Investigating End-to-end Speech Recognition for Mandarin-english Code-switching , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[68]  Younes Samih,et al.  Effects of Dialectal Code-Switching on Speech Modules: A Study Using Egyptian Arabic Broadcast Speech , 2020, INTERSPEECH.

[69]  Suryakanth V. Gangashetty,et al.  Adapting monolingual resources for code-mixed hindi-english speech recognition , 2017, 2017 International Conference on Asian Language Processing (IALP).

[70]  Hermann Ney,et al.  iCNC and iROVER: the limits of improving system combination with classification? , 2008, INTERSPEECH.

[71]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[72]  Mohammed O. Elfahal,et al.  Automatic Recognition and Identification for Mixed Sudanese Arabic – English Languages Speech , 2019 .

[73]  Sunil Kumar Kopparapu,et al.  Mixed Language Speech Recognition without Explicit Identification of Language , 2012 .

[74]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[75]  Tan Lee,et al.  Automatic Recognition of Cantonese-English Code-Mixing Speech , 2009, ROCLING/IJCLCLP.

[76]  Shinji Watanabe,et al.  Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration , 2019, INTERSPEECH.

[77]  Ngoc Thang Vu,et al.  End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning , 2019, INTERSPEECH.

[78]  Abdel-Rahman H. Abu-Melhim Code-Switching and Linguistic Accommodation in Arabic , 1991 .

[79]  Eirlys E. Davies,et al.  The syntax of Arabic-French code-switching , 1983 .

[80]  Jeff A. Bilmes,et al.  Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[81]  Pascale Fung,et al.  Towards End-to-end Automatic Code-Switching Speech Recognition , 2018, ArXiv.

[82]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[83]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[84]  Noor Al-Qaysi Examining Students' and Educators' Attitudes Towards the use of Code-Switching within Higher Educational Environments in Oman , 2016 .

[85]  Nahla Nola Bacha,et al.  Foreign Language Education in Lebanon: A Context of Cultural and Curricular Complexities , 2011 .

[86]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[87]  Yu Zhang,et al.  Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM , 2017, INTERSPEECH.

[88]  Yu Zhang,et al.  Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera , 2014, INTERSPEECH.

[89]  Martin Haspelmath,et al.  Lexical borrowing : Concepts and issues , 2009 .

[90]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[91]  Ying Li,et al.  A Mandarin-English Code-Switching Corpus , 2012, LREC.

[92]  Richard M. Stern,et al.  LATTICE COMBINATION FOR IMPROVED SPEECH RECOGNITON , 2001 .

[93]  Lori Lamel,et al.  The French-Algerian Code-Switching Triggered audio corpus (FACST) , 2018, LREC.

[94]  James R. Glass,et al.  The MGB-2 challenge: Arabic multi-dialect broadcast media recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[95]  Thomas Niesler,et al.  Automatic Speech Recognition of English-isiZulu Code-switched Speech from South African Soap Operas , 2016, SLTU.

[96]  J. Herring,et al.  Building bilingual corpora , 2014 .