Identification and automatic generation of prosodic contours for a text-to-speech synthesis system in French

This paper presents the realisation of an automatically trainable computational prosodic model for French Textto-Speech Synthesis. The methodology proposes the construction of the model in two steps. The first step consists in predicting fundamental frequency contours and duration of syllables from abstract prosodic markers using neural networks [17,12]. In this step, the abstract prosodic markers are automatically extracted from the signal by analysing prosodic realisations [2] and identifying a prosodic alphabet and a set of labelling rules. The second step integrates the model into the CNET Textto-Speech Synthesis system [7] by using its linguistic levels and predicting abstract prosodic markers from text and linguistic labels. The system is evaluated by naïve listeners and compared with the actual CNET Text-to-Speech Synthesis system.