Cross-Stream Dependency Modeling for HMM-Based Speech Synthesis

This paper presents a method that the dependency between F0 and spectral features are modeled for the HMM-based parametric speech synthesis system. In conventional systems these two features are modeled as two independent streams, which is inconsistent with the fact that there always exists interaction between the extracted F0 and spectral parameters for model training. A piecewise linear transform is introduced in this paper to explicitly model the dependency of spectrum on F0. The results of our experiments show that the proposed method is able to improve the accuracy of spectral parameter prediction if the F0 features are predicted based on a reliable voicing decision.