论文信息 - An intonation model for embedded devices based on natural F0 samples

An intonation model for embedded devices based on natural F0 samples

The evolution of hand-held devices has made possible the porting of high quality Text-to-Speech systems to embedded platforms. However, linguistic resources required to build natural-sounding prosody models still need to be scaled down, to meet the hardware specifications of the devices. In this work, we present a compact intonation model that brings together the naturalness of corpus based prosody modeling with the limited nature of the embedded TtS applications. A sampling process of 3 points per syllable over a small set of appropriately set up utterances is used as the tonal unit database. The sampled points are applied at synthesis time over an onset/offset syllabic structuring of phrases. The model requires less than 1KB of storage for modeling each prosodic phrase class. The application to the Greek language is being demonstrated utilizing only lexical stress information.

Georgios Kouroupetroglou | Gerasimos Xydas

[1] Alan W. Black,et al. Flite: a small fast run-time synthesis engine , 2001, SSW.

[2] Alan W. Black,et al. Generating F/sub 0/ contours from ToBI labels using linear regression , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3] Georgios Kouroupetroglou,et al. Modeling Prosodic Structures in Linguistically Enriched Environments , 2004, TSD.

[4] Paul Taylor,et al. Festival Speech Synthesis System , 1998 .

[5] Paul Taylor,et al. Heterogeneous relation graphs as a formalism for representing linguistic information , 2001, Speech Commun..

[6] Georgios Kouroupetroglou,et al. The DEMOSTHeNES speech composer , 2001, SSW.

[7] Paul Taylor,et al. Speech synthesis by phonological structure matching , 1999, EUROSPEECH.

[8] Alan W. Black,et al. Arabic in my hand: small-footprint synthesis of egyptian arabic , 2003, INTERSPEECH.