论文信息 - The NICT Entry for the Blizzard Challenge 2009: an Enhanced HMM-based Speech Synthesis System with Trajectory Training Considering Global Variance and State-Dependent Mixed Excitation

The NICT Entry for the Blizzard Challenge 2009: an Enhanced HMM-based Speech Synthesis System with Trajectory Training Considering Global Variance and State-Dependent Mixed Excitation

This paper describes the NICT speech synthesis system submitted to the Blizzard Challenge 2009: a hidden Markov model (HMM)-based synthesizer constructed by training trajectory HMMs considering global variance. To improve naturalness of the synthesized speech a mixed excitation approach based on closed-loop residual modeling through the training of statedependent lters is employed. According to the ofcial results the system in question performs well in terms of naturalness and intelligibility although synthesized speech does not sound very similar to the original speaker. Index Terms: speech synthesis, Blizzard Challenge, HMMbased speech synthesis, trajectory HMM, residual modeling.

[1] Keiichi Tokuda,et al. A 16kb/s Wideband CELP-Based Speech Coder Using Mel-Generalized Cepstral Analysis , 2000 .

[2] Heiga Zen,et al. An excitation model for HMM-based speech synthesis based on residual modeling , 2007, SSW.

[3] Keiichi Tokuda,et al. An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Keiichi Tokuda,et al. A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis , 2009, INTERSPEECH.

[5] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6] John E. Markel,et al. Linear Prediction of Speech , 1976, Communication and Cybernetics.

[7] Keiichi Tokuda,et al. The NICT/ATR speech synthesis system for the Blizzard Challenge 2008 , 2008 .

[8] Jorma Rissanen,et al. Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[9] Simon King,et al. The Blizzard Challenge 2008 , 2008 .

[10] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[11] Heiga Zen,et al. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[12] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[13] Tomoki Toda,et al. Trajectory training considering global variance for HMM-based speech synthesis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.