Speech Synthesis of Code-Mixed Text

Most Text to Speech (TTS) systems today assume that the input text is in a single language and is written in the same language that the text needs to be synthesized in. However, in bilingual and multilingual communities, code mixing or code switching occurs in speech, in which speakers switch between languages in the same utterance. Due to the popularity of social media, we now see code-mixing even in text in these multilingual communities. TTS systems capable of synthesizing such text need to be able to handle text that is written in multiple languages and scripts. Code-mixed text poses many challenges to TTS systems, such as language identification, spelling normalization and pronunciation modeling. In this work, we describe a preliminary framework for synthesizing code-mixed text. We carry out experiments on synthesizing code-mixed Hindi and English text. We find that there is a significant user preference for TTS systems that can correctly identify and pronounce words in different languages.

[1]  John Nerbonne,et al.  Phonetic Distance between Dutch Dialects , 1996 .

[2]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Jatin Sharma,et al.  POS Tagging of English-Hindi Code-Mixed Social Media Content , 2014, EMNLP.

[4]  Yong Zhao,et al.  Microsoft Mulan - a bilingual TTS system , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Carol Myers-Scotton,et al.  Duelling Languages: Grammatical Structure in Codeswitching , 1993 .

[6]  Zhizheng Wu,et al.  Sentence-level control vectors for deep neural network speech synthesis , 2015, INTERSPEECH.

[7]  Chng Eng Siong,et al.  Mandarin–English code-switching speech corpus in South-East Asia: SEAME , 2015, Lang. Resour. Evaluation.

[8]  Rishiraj Saha Roy,et al.  Overview and Datasets of FIRE 2013 Track on Transliterated Search , 2013 .

[9]  Chen Liu,et al.  A Combined Phonetic-Phonological Approach to Estimating Cross-Language Phoneme Similarity in an ASR Environment , 2006, SIGMORPHON.

[10]  Keiichi Tokuda,et al.  The blizzard challenge - 2005: evaluating corpus-based speech synthesis on common datasets , 2005, INTERSPEECH.

[11]  Tanja Schultz,et al.  Acoustic-Phonetic Unit Similarities For Context Dependent Acoustic Model Portability , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  J. Ajmera,et al.  Phonetic Distance Measures for Speech Recognition Vocabulary and Grammar Optimization , 2007 .

[13]  Frank K. Soong,et al.  An HMM-based bilingual (Mandarin-English) TTS , 2007, SSW.

[14]  Marelie H. Davel,et al.  Implications of Sepedi/English code switching for ASR systems , 2013 .

[15]  John Nerbonne,et al.  Measuring Dialect Distance Phonetically , 1997, SIGMORPHON@EACL.

[16]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[17]  Tien Ping Tan,et al.  Automatic Speech Recognition of Code Switching Speech Using 1-Best Rescoring , 2012, 2012 International Conference on Asian Language Processing.

[18]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[19]  Julia Hirschberg,et al.  Overview for the First Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[20]  Monojit Choudhury,et al.  Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System , 2014, CodeSwitch@EMNLP.

[21]  Parth Gupta,et al.  Query expansion for mixed-script information retrieval , 2014, SIGIR.

[22]  Ryan Cotterell,et al.  An Algerian Arabic-French Code-Switched Corpus , 2014 .