Multi-Layer F0 Modeling for HMM-Based Speech Synthesis

This paper proposes a two-layer fundamental frequency (FO) modeling method for HMM-based parametric speech synthesis. The FO models are trained for each context- dependent phoneme in the conventional HMM-based speech synthesis system. Considering the super-segmental characteristics of FO features, an explicit syllable-layer FO model is introduced in this paper. At synthesis stage, the FO contour is generated by maximizing the combined likelihood functions of the phone-layer and syllable-layer FO models. The objective and subjective evaluation results in our experiments show that the proposed multi-layer FO modeling method can improve the performance of FO prediction for emotional speech synthesis.