论文信息 - Cross-Stream Dependency Modeling for HMM-Based Speech Synthesis

Cross-Stream Dependency Modeling for HMM-Based Speech Synthesis

This paper presents a method that the dependency between F0 and spectral features are modeled for the HMM-based parametric speech synthesis system. In conventional systems these two features are modeled as two independent streams, which is inconsistent with the fact that there always exists interaction between the extracted F0 and spectral parameters for model training. A piecewise linear transform is introduced in this paper to explicitly model the dependency of spectrum on F0. The results of our experiments show that the proposed method is able to improve the accuracy of spectral parameter prediction if the F0 features are predicted based on a reliable voicing decision.

Wei Zhang | Ren-Hua Wang | Zhen-Hua Ling

[1] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[2] Keiichi Tokuda,et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3] Christophe d'Alessandro,et al. Voice quality modification for emotional speech synthesis , 2003, INTERSPEECH.

[4] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..

[5] Keiichi Tokuda,et al. Duration modeling for HMM-based speech synthesis , 1998, ICSLP.

[6] Abeer Alwan,et al. Text to Speech Synthesis: New Paradigms and Advances , 2004 .

[7] Ren-Hua Wang,et al. USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method , 2006 .

[8] K. Tokuda,et al. Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.