论文信息 - Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis

Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis

In standard approaches to hidden Markov model (HMM)-based speech synthesis, window coefficients for calculating dynamic features are pre-determined and fixed. This may not be optimal to capture various context-dependent dynamic characteristics in speech signals. This paper proposes a data-driven technique to estimate the window coefficients. They are optimized so as to maximize the likelihood of trajectory HMMs given data. Experimental results show that the proposed technique can achieve a comparable performance with the meanand variance-updated trajectory HMMs in the naturalness of synthesized speech, while offering significantly lower computational cost.

[1] Peder A. Olsen,et al. Modeling inverse covariance matrices by basis expansion , 2002, IEEE Transactions on Speech and Audio Processing.

[2] Heiga Zen,et al. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[3] Zhi-Jie Yan,et al. A perceptual study of acceleration parameters in HMM-based TTS , 2010, INTERSPEECH.

[4] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5] Keiichi Tokuda,et al. Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis , 2008, INTERSPEECH.

[6] Ren-Hua Wang,et al. Minimum Generation Error Training for HMM-Based Speech Synthesis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[8] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[9] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[10] Keiichi Tokuda,et al. An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Sadaoki Furui,et al. Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[12] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13] Heiga Zen,et al. Statistical parametric speech synthesis based on product of experts , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.