Automatic transcription of intonation using an identified prosodic alphabet

A solution is proposed for rapidly adapting prosodic models to a new voice or a new application. First, a prosodic alphabet that is supported by linguistic knowledge is identified at the acoustic level. The observation of the realisation of prosodic events on the acoustic corpus allows classes of breaks, F0 shapes and accents to be constructed and automatic transcription rules to be written. Then the transcribed corpus is used in the estimation of the parameters of a prosodic model for French. The good F0 contours and duration generated with the prosodic model verify the agreement of the identified alphabets to describe prosodic phenomena. Finally, the prosodic model is integrated in the CNET standard French Text-to-Speech Synthesis system. The quality of the generated prosody is considered by naïve listeners as equivalent to the handcrafted system. This result verifies the appropriateness of the alphabet as prosodic descriptors.