Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text

Most Text to Speech (TTS) systems today assume that the input is in a single language written in its native script, which is the language that the TTS database is recorded in. However, due to the rise in conversational data available from social media, phenomena such as code-mixing, in which multiple languages are used together in the same conversation or sentence are now seen in text. TTS systems capable of synthesizing such text need to be able to handle multiple languages at the same time, and may also need to deal with noisy input. Previously, we proposed a framework to synthesize code-mixed text by using a TTS database in a single language, identifying the language that each word was from, normalizing spellings of a language written in a non-standardized script and mapping the phonetic space of mixed language to the language that the TTS database was recorded in. We extend this cross-lingual approach to more language pairs, and improve upon our language identification technique. We conduct listening tests to determine which of the two languages being mixed should be used as the target language. We perform experiments for code-mixed Hindi-English and German-English and conduct listening tests with bilingual speakers of these languages. From our subjective experiments we find that listeners have a strong preference for cross-lingual systems with Hindi as the target language for code-mixed Hindi and English text. We also find that listeners prefer cross-lingual systems in English that can synthesize German text for codemixed German and English text.

[1]  Theresa Wilson,et al.  Language Identification for Creating Language-Specific Twitter Collections , 2012 .

[2]  Dong Nguyen,et al.  Word Level Language Identification in Online Multilingual Communication , 2013, EMNLP.

[3]  Frank K. Soong,et al.  An HMM-based bilingual (Mandarin-English) TTS , 2007, SSW.

[4]  Jatin Sharma,et al.  POS Tagging of English-Hindi Code-Mixed Social Media Content , 2014, EMNLP.

[5]  Monojit Choudhury,et al.  Word-level Language Identification using CRF: Code-switching Shared Task Report of MSR India System , 2014, CodeSwitch@EMNLP.

[6]  Alan W Black,et al.  The Festvox Indic Frontend for Grapheme-to-Phoneme Conversion , 2016 .

[7]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[8]  Alan W. Black,et al.  Speech Synthesis of Code-Mixed Text , 2016, LREC.

[9]  Ben King,et al.  Labeling the Languages of Words in Mixed-Language Documents using Weakly Supervised Methods , 2013, NAACL.

[10]  Fei Xia,et al.  Language ID in the Context of Harvesting Language Data off the Web , 2009, EACL.

[11]  Mykola Pechenizkiy,et al.  Graph-Based N-gram Language Identication on Short Texts , 2011 .

[12]  Marelie H. Davel,et al.  Implications of Sepedi/English code switching for ASR systems , 2013 .

[13]  Sadaoki Furui,et al.  New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer , 2006, Speech Commun..

[14]  Jatin Sharma,et al.  Query word labeling and Back Transliteration for Indian Languages: Shared task system description , 2013 .

[15]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Yong Zhao,et al.  Microsoft Mulan - a bilingual TTS system , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Parth Gupta,et al.  Query expansion for mixed-script information retrieval , 2014, SIGIR.

[18]  Somnath Banerjee,et al.  Overview of FIRE-2015 Shared Task on Mixed Script Information Retrieval , 2015, FIRE Workshops.

[19]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[20]  Tien Ping Tan,et al.  Automatic Speech Recognition of Code Switching Speech Using 1-Best Rescoring , 2012, 2012 International Conference on Asian Language Processing.

[21]  Julia Hirschberg,et al.  Overview for the First Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[22]  Prasenjit Majumder,et al.  Overview of the FIRE 2013 Track on Transliterated Search , 2013, FIRE.

[23]  Rishiraj Saha Roy,et al.  Overview and Datasets of FIRE 2013 Track on Transliterated Search , 2013 .

[24]  N. Poulisse,et al.  Duelling Languages: Grammatical Structure in Codeswitching , 1998 .

[25]  Beat Pfister,et al.  From multilingual to polyglot speech synthesis , 1999, EUROSPEECH.

[26]  LiHaizhou,et al.  Mandarin---English code-switching speech corpus in South-East Asia , 2015 .

[27]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[28]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[29]  Chng Eng Siong,et al.  Mandarin–English code-switching speech corpus in South-East Asia: SEAME , 2015, Lang. Resour. Evaluation.

[30]  Gokul Chittaranjan,et al.  Overview of FIRE 2014 Track on Transliterated Search , 2014 .

[31]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[32]  Alan W. Black,et al.  CLUSTERGEN: a statistical parametric synthesizer using trajectory modeling , 2006, INTERSPEECH.