Trajectory training considering global variance for HMM-based speech synthesis

This paper presents a novel method for training hidden Markov models (HMMs) for use in HMM-based speech synthesis. The primary goal of HMM parameter optimization is to ensure that parameters generated from the trained models exhibit similar properties to natural speech. In this paper, two major problems in conventional training are addressed: 1) the inconsistency between the training and synthesis optimization criterion; and 2) the over-smoothing caused by the statistical modeling process. The proposed method integrates the global variance (GV) criterion into a trajectory training method to give a unified framework for both training and synthesis which provides both a consistent optimization criterion and a closed form solution for parameter generation. The experimental results demonstrate that the proposed method yields a significant improvement in the naturalness of synthetic speech.

[1]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[2]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[3]  Keiichi Tokuda,et al.  Multi-Space Probability Distribution HMM , 2002 .

[4]  Heiga Zen,et al.  Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[5]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.

[6]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Li-Rong Dai,et al.  Minimum generation error criterion considering global/local variance for HMM-based speech synthesis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Ren-Hua Wang,et al.  Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.