论文信息 - Data-Driven Synthesis of Fundamental Frequency Contours for TTS Systems Based on a Generation Process Model

Data-Driven Synthesis of Fundamental Frequency Contours for TTS Systems Based on a Generation Process Model

A data-driven method of fundamental frequency (F0) contour synthesis was developed for Japanese text-to-speech (TTS) conversion systems. In the method, synthesis is done using the F0 contour generation process model, and the model parameters for each accent phrase are estimated using statistical methods. Although it was already shown that the synthesized F0 contours sounded highly natural as those using heuristic rules arranged by experts, occasional low quality happened depending on sentences to be synthesized. In the current paper, information on sentence structure, automatically obtainable through the parsing process, is added to input parameters of the statistical methods to obtain a better estimation. The experimental results showed that the new parameter was effective for improving especially phrase component estimation. Furthermore, data-driven estimation of accent phrase boundaries for input text, a necessary step to realize TTS conversion, was also realized in a similar way. The rate of correct estimation reached 90 %.

Keikichi Hirose | Nobuaki Minematsu | Masaya Eto

[1] Yoshinori Sagisaka,et al. Automatic Extraction of F 0 Control Rules Using Statistical Analysis , 1997 .

[2] Keiichi Tokuda,et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3] Keikichi Hirose,et al. Corpus-based synthesis of fundamental frequency contours based on a generation process model , 2001, INTERSPEECH.

[4] Keikichi Hirose,et al. A System for the Synthesis of High-Quality Speech from Texts on General Weather Conditions (Special Section on Speech Synthesis: Current Technologies and Equipment) , 1993 .

[5] Keikichi Hirose,et al. Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .