Generation of F0 contour using stochastic mapping and vector quantization control parameters

This paper introduces an F0 contour generation method for text-to-speech synthesis using stochastic mapping and vector quantization control parameters. This model uses a new F0 contour labelling scheme based on the RFC (rise/fall/connection) model, which describes F0 contour patterns with seven F0 labels and three pause labels. This paper also suggests an efficient selection method for control parameters instead of using the mean values of the control parameters. We achieved a 78.06% accuracy in the F0 label prediction and a 95.87% accuracy in the pause label prediction using this model. The experimental results shows that synthesized speech using vector quantization control parameters is more natural than using the mean values of the feature parameters.