An F0 contour control model for totally speaker driven text to speech system

Totally Speaker Driven Text to Speech System produces high quality and natural speech resembling the acoustic and prosodic characteristics of the original speech corpus. In the F0 contour control of this system, an F0 contour of a whole sentence is produced by concatenating segmental F0 contours generated by modifying vectors that are representatives of typical F0 contours. The representative vectors are selected from the F0 contour codebook, which is designed so as to minimize the approximation error between F0 contours generated by the proposed model and real F0 contours extracted from a speech corpus. It was con rmed by experiments with Japanese speech corpus that F0 contours can be modeled with small approximation errors by only 48 representative vectors, and the synthetic speech sounded very natural and resembled the prosodic characteristics of the original speaker.