论文信息 - Exploring the naturalness of several German high-quality-text-to-speech systems

Exploring the naturalness of several German high-quality-text-to-speech systems

The synthesis of near-to-natural F0 contours is an important issue in text-to-speech and crucial to the naturalness and intelligibility of synthetic speech. In earlier studies of the first author a model of German intonation was developed that is based on the quantitative Fujisaki-model. The current paper addresses a perception experiment comparing a TTS-system incorporating this new approach with several German TTS-systems with high segmental quality. Natural speech samples and a synthesis version with natural segment durations were used as references. Results show, that the natural speech samples unanimously received 10 points on a 0 to 10 point scale. The best TTS-systems cluster around a mean value of 5.0, whereas the variant with natural durations reached a mean score of 6.6 points, indicating the importance of closely modeling natural segment durations.

Dieter Mehnert | Hansjörg Mixdorff

[1] Dieter Mehnert,et al. Comparing the naturalness of several approaches for generating F0 contours in German text‐to‐speech systems , 1999 .

[2] Hansjörg Mixdorff,et al. A scheme for a model-based synthesis by rule of F0 contours of German utterances , 1995, EUROSPEECH.

[3] 李幼升,et al. Ph , 1989 .