An intonation model for embedded devices based on natural F0 samples

The evolution of hand-held devices has made possible the porting of high quality Text-to-Speech systems to embedded platforms. However, linguistic resources required to build natural-sounding prosody models still need to be scaled down, to meet the hardware specifications of the devices. In this work, we present a compact intonation model that brings together the naturalness of corpus based prosody modeling with the limited nature of the embedded TtS applications. A sampling process of 3 points per syllable over a small set of appropriately set up utterances is used as the tonal unit database. The sampled points are applied at synthesis time over an onset/offset syllabic structuring of phrases. The model requires less than 1KB of storage for modeling each prosodic phrase class. The application to the Greek language is being demonstrated utilizing only lexical stress information.