论文信息 - An F0 contour control model for totally speaker driven text to speech system

An F0 contour control model for totally speaker driven text to speech system

Totally Speaker Driven Text to Speech System produces high quality and natural speech resembling the acoustic and prosodic characteristics of the original speech corpus. In the F0 contour control of this system, an F0 contour of a whole sentence is produced by concatenating segmental F0 contours generated by modifying vectors that are representatives of typical F0 contours. The representative vectors are selected from the F0 contour codebook, which is designed so as to minimize the approximation error between F0 contours generated by the proposed model and real F0 contours extracted from a speech corpus. It was con rmed by experiments with Japanese speech corpus that F0 contours can be modeled with small approximation errors by only 48 representative vectors, and the synthetic speech sounded very natural and resembled the prosodic characteristics of the original speaker.

Takehiko Kagoshima | Masami Akamine | Masahiro Morita | Shigenobu Seto

[1] Takehiko Kagoshima,et al. Automatic rule generation for linguistic features analysis using inductive learning technique: linguistic features analysis in TOS drive TTS system , 1998, ICSLP.

[2] Takehiko Kagoshima,et al. Analytic generation of synthesis units by closed loop training for totally speaker driven text to speech system (TOS drive TTS) , 1998, ICSLP.

[3] Tsuneo Nitta,et al. A novel segment-concatenation algorithm for a cepstrum-based synthesizer , 1994, ICSLP.

[4] Alex Acero,et al. Recent improvements on Microsoft's trainable text-to-speech system-Whistler , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Yoshinori Sagisaka,et al. Automatic extraction of fundamental frequency control rules by statistical analysis , 1997, Systems and Computers in Japan.