Toward multiple-language TTS: experiments in English and Mandarin

Text-to-speech systems have dramatically improved in recent years through the use of corpus-based concatenative approaches, and we are beginning to see an interest in endowing them with the ability to handle more than the native language for which they have been developed. In this paper we present ongoing work at IBM in text-to-speech systems that can produce high-quality synthesis in more than one language. We illustrate the discussion with a case study in which two systems, originally developed to support English and Mandarin respectively, have been extended to support each other’s languages. We describe the challenges faced when adapting one system to a different target language, propose adaptation solutions, and present the results of perceptual tests carried out to evaluate how the approaches compare with the performance of the native systems.

[1]  Wei Li,et al.  A corpus-based Chinese speech synthesis with contextual dependent unit selection , 2000, INTERSPEECH.

[2]  Marc C. Beutnagel,et al.  The AT & T NEXT-GEN TTS system , 1999 .

[3]  Hu Peng,et al.  Selecting non-uniform units from a very large corpus for concatenative speech synthesizer , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Wei Zhang,et al.  Statistic prosody structure prediction , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[5]  Michael Picheny,et al.  The IBM expressive speech synthesis system , 2004, INTERSPEECH.

[6]  Alex Acero,et al.  Whistler: a trainable text-to-speech system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Wei Zhang,et al.  Probability based prosody model for unit selection , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Yong Zhao,et al.  Microsoft Mulan - a bilingual TTS system , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Mahesh Viswanathan,et al.  Recent improvements to the IBM trainable speech synthesis system , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10]  Robert E. Donovan,et al.  The IBM trainable speech synthesis system , 1998, ICSLP.