论文信息 - Multi-Layer F0 Modeling for HMM-Based Speech Synthesis

Multi-Layer F0 Modeling for HMM-Based Speech Synthesis

This paper proposes a two-layer fundamental frequency (FO) modeling method for HMM-based parametric speech synthesis. The FO models are trained for each context- dependent phoneme in the conventional HMM-based speech synthesis system. Considering the super-segmental characteristics of FO features, an explicit syllable-layer FO model is introduced in this paper. At synthesis stage, the FO contour is generated by maximizing the combined likelihood functions of the phone-layer and syllable-layer FO models. The objective and subjective evaluation results in our experiments show that the proposed multi-layer FO modeling method can improve the performance of FO prediction for emotional speech synthesis.

Li-Rong Dai | Zhen-Hua Ling | Cheng-Cheng Wang | Bu-Fan Zhang

[1] Heng Lu,et al. The USTC and iFlytek Speech Synthesis Systems for Blizzard Challenge 2007 , 2007 .

[2] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[3] Keikichi Hirose,et al. Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .

[4] Yi Xu,et al. A pitch target approximation model for F0 contours in Mandarin , 1999 .

[5] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6] Gérard Bailly,et al. A superposed prosodic model for Chinese text-to-speech synthesis , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[7] Keiichi Tokuda,et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8] Koichi Shinoda,et al. MDL-based context-dependent subword modeling for speech recognition , 2000 .

[9] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.