Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis

This paper presents a method to model the global variance (GV) of log power spectrums derived from the line spectral pairs (LSPs) in a sentence for HMM-based parametric speech synthesis. Different from the conventional GV method where the observations for GV model training are the variances of spectral parameters for each training sentence, our proposed method directly models the temporal variances of each frequency point in the spectral envelope reconstructed using LSPs. At synthesis stage, the likelihood function of trained GV model is integrated into the maximum likelihood parameter generation algorithm to alleviate the over-smoothing effect on the generated spectral structures. Experiment results show that the proposed method can outperform the conventional GV method when LSPs are used as the spectral parameters and improve the naturalness of synthetic speech significantly.

[1]  Zhi-Jie Yan,et al.  Rich context modeling for high quality HMM-based TTS , 2009, INTERSPEECH.

[2]  H. Zen,et al.  Minimum generation error criterion considering global/local variance for HMM-based speech synthesis , 2008 .

[3]  Heiga Zen,et al.  Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..

[4]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Keiichi Tokuda,et al.  Mixed excitation for HMM-based speech synthesis , 2001, INTERSPEECH.

[6]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[7]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[8]  Ren-Hua Wang,et al.  USTC System for Blizzard Challenge 2006 an Improved HMM-based Speech Synthesis Method , 2006, Blizzard Challenge.

[9]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[10]  Koichi Shinoda,et al.  MDL-based context-dependent subword modeling for speech recognition , 2000 .

[11]  Zhenhua Ling HMM-based Unit Selection Using F , 2006 .

[12]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..