论文信息 - A Data-driven Adaptation of Prosody in a Multilingual TTS

A Data-driven Adaptation of Prosody in a Multilingual TTS

Proper accentuation and phrasing make the syntactic and semantic structure of the message more transparent to the listener. Therefore a good modeling of prosody in a TTS system has to be structured into appropriate levels. The implemented prosodic hierarchy should guide the listeners’ attention and help in support of the comprehension process. Since prosody functions as a distractor, it is very important to build the prosody module in a TTS system very carefully. With the goal towards improvements of naturalness a concept of a selective hierarchical approach of prominence disambiguation and symbolic modeling will be introduced. The selective statistically based prominence disambiguation and prediction concept will be discussed and the implementation of the neural network (NN) module for prediction of symbolic tags into a multilingual TTS system introduced. We’ll conclude with prediction results and a suitability test of the introduced selective approach based on preliminary acoustical tests performed in a multilingual TTS.

Zdravko Kacic | Bogomir Horvat | Janez Stergar | Çaglayan Erdem

[1] Noam Chomsky,et al. The Sound Pattern of English , 1968 .

[2] B. M. Streefkerk. Prominence. Acoustic and lexical/syntactic correlates , 2002 .

[3] Vincent J. van Heuven,et al. Acoustic correlates of linguistic stress and accent in Dutch and American English , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4] Slovenian Lang,et al. An Environment for Word Prominence Classification in Slovenian Language , 2003 .

[5] D. Bolinger. Accent Is Predictable (If You're a Mind-Reader) , 1972 .

[6] Bogomir Horvat,et al. Designing Prosodic Databases for Automatic Modeling of Slovenian Language in a Multilingual TTS System , 2002, LREC.

[7] Matej Rojc,et al. Design of Optimal Slovenian Speech Corpus for Use in the Concatenative Speech Synthesis System , 2000, LREC.

[8] P Taylor,et al. Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[9] Bogomir Horvat,et al. Labeling of Symbolic Prosody Breaks for the Slovenian Language , 2003, Int. J. Speech Technol..

[10] Fabio Tamburini,et al. Automatic detection of prosodic prominence in continuous speech , 2002, LREC.