论文信息 - A Bayesian approach to HMM-based speech synthesis

A Bayesian approach to HMM-based speech synthesis

This paper proposes a new framework of speech synthesis based on the Bayesian approach. The Bayesian method is a statistical technique for estimating reliable predictive distributions by marginalizing model parameters. In the proposed framework, all processes for constructing the system can be derived from one single predictive distribution which represents the basic problem of speech synthesis directly. Using HMM as the likelihood function and assuming some approximations, it can be regarded as an application of the variational Bayesian method to the HMM-based speech synthesis. Experimental results show that the proposed method outperforms the conventional one in a subjective test.

Heiga Zen | Yoshihiko Nankaku | Keiichi Tokuda | Takashi Masuko | Kei Hashimoto

[1] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[3] K. Tokuda,et al. Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4] Hagai Attias,et al. Inferring Parameters and Structure of Latent Variable Models by Variational Bayes , 1999, UAI.

[5] Keiichi Tokuda,et al. An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features , 1995, EUROSPEECH.

[6] K. Koishida,et al. Vector quantization of speech spectral parameters using statistics of dynamic features , 1997 .

[7] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .

[8] Shigeru Katagiri,et al. ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[9] Naonori Ueda,et al. Variational bayesian estimation and clustering for speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.

[10] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[11] Naonori Ueda,et al. Application of Variational Bayesian Approach to Speech Recognition , 2002, NIPS.

[12] Keiichi Tokuda,et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13] Heiga Zen,et al. Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition , 2008, INTERSPEECH.