Text analysis and language identification for polyglot text-to-speech synthesis

In multilingual countries, text-to-speech synthesis systems often have to deal with texts containing inclusions of multiple other languages in form of phrases, words, or even parts of words. In such multilingual cultural settings, listeners expect a high-quality text-to-speech synthesis system to read such texts in a way that the origin of the inclusions is heard, i.e., with correct language-specific pronunciation and prosody. The challenge for a text analysis component of a text-to-speech synthesis system is to derive from mixed-lingual sentences the correct polyglot phone sequence and all information necessary to generate natural sounding polyglot prosody. This article presents a new approach to analyze mixed-lingual sentences. This approach centers around a modular, mixed-lingual morphological and syntactic analyzer, which additionally provides accurate language identification on morpheme level and word and sentence boundary identification in mixed-lingual texts. This approach can also be applied to word identification in languages without a designated word boundary symbol like Chinese or Japanese. To date, this mixed-lingual text analysis supports any mixture of English, French, German, Italian, and Spanish. Because of its modular design it is easily extensible to additional languages.

[1]  Jilei Tian,et al.  On text-based language identification for multilingual speech recognition systems , 2002, INTERSPEECH.

[2]  Thierry Dutoit,et al.  High Quality Text-To-Speech Synthesis of the French Language , 2003 .

[3]  Emmanuel GiguetGREYC,et al.  Categorization according to Language : A step toward combiningLinguistic Knowledge and Statistic Learning , 2007 .

[4]  Philippe Boula de Mareüil,et al.  On the pronunciation of acronyms in French and in Italian , 2001, INTERSPEECH.

[5]  Harald Romsdorfer,et al.  Character Stream Parsing of Mixed-lingual Text , 2006 .

[6]  Mike McAllister The problems of punctuation ambiguity in fully automatic text-to-speech conversion , 1989, EUROSPEECH.

[7]  Harald Romsdorfer,et al.  A Mixed-Lingual Phonological Component Which Drives the Statistical Prosody Control of a Polyglot TTS Synthesis System , 2004, MLMI.

[8]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[9]  Harald Romsdorfer,et al.  Mixed-lingual text analysis for polyglot TTS synthesis , 2003, INTERSPEECH.

[10]  Jilei Tian,et al.  n-gram and decision tree based language identification for written words , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[11]  Harald Romsdorfer,et al.  Phonetic labeling and segmentation of mixed-lingual prosody databases , 2005, INTERSPEECH.

[12]  Harald Romsdorfer,et al.  Multi-context rules for phonological processing in polyglot TTS synthesis , 2004, INTERSPEECH.

[13]  David H. D. Warren,et al.  Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks , 1980, Artif. Intell..

[14]  Michael Riley,et al.  Some Applications of Tree-based Modelling to Speech and Language , 1989, HLT.

[15]  Marcel Riedi,et al.  Modeling segmental duration with multivariate adaptive regression splines , 1997, EUROSPEECH.

[16]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[17]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[18]  Jilei Tian,et al.  Scalable neural network based language identification from written text , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Chilin Shih,et al.  A Stochastic Finite-State Word-Segmentation Algorithm for Chinese , 1994, ACL.

[20]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[21]  Beat Pfister,et al.  From multilingual to polyglot speech synthesis , 1999, EUROSPEECH.

[22]  Christof Traber,et al.  SVOX: the implementation of a text-to-speech system for German , 1995 .

[23]  Kenneth Ward Church,et al.  Morphology and rhyming: two powerful alternatives to letter-to-sound rules for speech synthesis , 1990, SSW.