ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong. We report ASCEND’s design and procedure for collecting the speech data, including annotations. ASCEND consists of 10.62 hours of clean speech, collected from 23 bilingual speakers of Chinese and English. Furthermore, we conduct baseline experiments using pre-trained wav2vec 2.0 models, achieving a best performance of 22.69% character error rate and 27.05% mixed error rate.

[1]  Ahmed Abdelali,et al.  Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR , 2021, Interspeech.

[2]  Pascale Fung,et al.  Are Multilingual Models Effective in Code-Switching? , 2021, CALCS.

[3]  Florian Metze,et al.  Towards Context-Aware End-to-End Code-Switching Speech Recognition , 2020, INTERSPEECH.

[4]  Qi Zhang,et al.  Data Augmentation for Code-Switch Language Modeling by Fusing Multiple Text Generation Methods , 2020, INTERSPEECH.

[5]  Qiangze Feng,et al.  The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results , 2020, ArXiv.

[6]  Ryo Masumura,et al.  Read and spontaneous speech classification based on variance of GMM supervectors , 2014, INTERSPEECH.

[7]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Libo Qin,et al.  CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP , 2020, ArXiv.

[9]  Haizhou Li,et al.  SEAME: a Mandarin-English code-switching speech corpus in south-east asia , 2010, INTERSPEECH.

[10]  Sara Stymne,et al.  Evaluating Word Embeddings for Indonesian–English Code-Mixed Text Based on Synthetic Data , 2020, CALCS.

[11]  Peng Xu,et al.  Meta-Transfer Learning for Code-Switched Speech Recognition , 2020, ACL.

[12]  Pascale Fung,et al.  BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling , 2021, NeurIPS Datasets and Benchmarks.

[13]  Chung-Hsien Wu,et al.  CECOS: A Chinese-English code-switching speech database , 2011, 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA).

[14]  Kuan-Yu Chen,et al.  A Preliminary Study on Leveraging Meta Learning Technique for Code-switching Speech Recognition , 2020, ROCLING.

[15]  Lei Xie,et al.  Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition , 2020, ArXiv.

[16]  Martine Adda-Decker,et al.  Comparison of Spectral Properties of Read, Prepared and Casual Speech in French , 2010, LREC.

[17]  Aung Si A diachronic investigation of Hindi–English code-switching, using Bollywood film scripts , 2011 .

[18]  Hao Zheng,et al.  AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[19]  Sadaoki Furui,et al.  Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance , 2008, Comput. Speech Lang..

[20]  Pascale Fung,et al.  Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences , 2019, CoNLL.

[21]  Eleonora Blaauw,et al.  The contribution of prosodic boundary markers to the perceptual difference between read and spontaneous speech , 1994, Speech Commun..

[22]  W. V. Dommelen,et al.  English of in L 1 and L 2 speakers ’ read and spontaneous speech , 2010 .

[23]  Lauren Calandruccio,et al.  Code-Switching in Highly Proficient Spanish/English Bilingual Adults: Impact on Masked Word Recognition. , 2018, Journal of speech, language, and hearing research : JSLHR.

[24]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[25]  Yun Lei,et al.  Dialect identification: Impact of differences between read versus spontaneous speech , 2010, 2010 18th European Signal Processing Conference.

[26]  Mitesh M. Khapra,et al.  A Dataset for Building Code-Mixed Goal Oriented Conversation Systems , 2018, COLING.

[27]  Chia-Yu Li,et al.  Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching , 2019, 2019 International Conference on Asian Language Processing (IALP).

[28]  Yi Liu,et al.  Investigating multi-task learning for automatic speech recognition with code-switching between mandarin and english , 2017, 2017 International Conference on Asian Language Processing (IALP).

[29]  Teresa Lynn,et al.  Code-switching in Irish tweets: A preliminary analysis , 2019 .

[30]  Peter Howell,et al.  Comparison of prosodic properties between read and spontaneous speech material , 1991, Speech Commun..

[31]  Genta Indra Winata Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling , 2021, ArXiv.

[32]  Dau-Cheng Lyu,et al.  Speech Recognition on Code-Switching Among the Chinese Dialects , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[33]  Ying Li,et al.  A Mandarin-English Code-Switching Corpus , 2012, LREC.

[34]  Maria Lourdes S. Bautista,et al.  Tagalog-english code switching as a mode of discourse , 2004 .

[35]  Li-chiung Yang,et al.  Understanding Mandarin Prosody: Tonal and Contextual Variations in Spontaneous Conversation , 2013, ROCLING/IJCLCLP.

[36]  Tan Lee,et al.  Development of a Cantonese-English code-mixing speech corpus , 2005, INTERSPEECH.

[37]  Alexei Baevski,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[38]  Laurence White,et al.  What do we expect spontaneous speech to sound like? , 2015, ICPhS.

[39]  Yu Zhang,et al.  Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.

[40]  Pascale Fung,et al.  Code-Switching Language Modeling using Syntax-Aware Multi-Task Learning , 2018, CodeSwitch@ACL.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Pascale Fung,et al.  Speech Recognition on English-Mandarin Code-Switching Data using Factored Language Models-with Part-of-Speech Tags , Language ID and Code-Switch Point Probability as Factors , 2011 .

[43]  Li Aijun,et al.  CHINESE PROSODY AND PROSODIC LABELING OF SPONTANEOUS SPEECH , 2002 .

[44]  Dong Wang,et al.  OC16-CE80: A Chinese-English mixlingual database and a speech recognition baseline , 2016, 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA).

[45]  T. Horst Codeswitching in the Irish-Latin Leabhar Breac: Mediæval homiletic culture , 2017 .

[46]  E. Nöth,et al.  Can You Tell Apart Spontaneous and Read Speech if You just Look at Prosody , 1995 .