论文信息 - Context-dependent additive log f_0 model for HMM-based speech synthesis

Context-dependent additive log f_0 model for HMM-based speech synthesis

Abstract Thispaperproposesacontext-dependentadditiveacousticmod-elling technique and its application to logarithmic fundamentalfrequency (logF 0 ) modelling for HMM-based speech synthe-sis. Intheproposedtechnique,meanvectorsofstate-outputdis-tributions are composed as the weighted sum of decision tree-clustered context-dependent bias terms. Its model parametersand decision trees are estimated and built based on the maxi-mumlikelihood(ML)criterion. Theproposedtechniquehasthepotential to capture the additive structure of logF 0 contours. Apreliminary experiment using a small database showed that theproposed technique yielded encouraging results. Index Terms : speech synthesis, HMMs, logF 0 modelling 1. Introduction Hidden Markov model (HMM)-based speech synthesis [1] hasgrowninpopularityinrecentyears. Inthisframework,thespec-trum, excitation, and durations of speech are modelled simul-taneously in a uniﬁed framework of HMMs. For a given textto be synthesized, speech parameter trajectories that maximisetheir output probabilities are generated from estimated HMMsunder constraints between static and dynamic features [2]. Typ-ical instances of this framework use mel-cepstral coefﬁcientsor line spectral pairs for their spectral parameters and logF

Heiga Zen | Norbert Braunschweiler | H. Zen | N. Braunschweiler

[1] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2] Heiga Zen,et al. Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems , 2009, INTERSPEECH.

[3] Jj Odell,et al. The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[4] Yoshinori Sagisaka,et al. Statistical modelling of speech segment duration by constrained tree regression , 2000 .

[5] Mark J. F. Gales. Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[6] Hiroya Fujisaki,et al. In search of models in speech communication research , 2009, INTERSPEECH.

[7] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[8] Heiga Zen,et al. Acoustic modeling with contextual additive structure for HMM-based speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9] Shinsuke Sakai. F0 modeling with multi-layer additive modeling based on a statistical learning technique , 2004, SSW.

[10] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[11] Keiichi Tokuda,et al. Multi-Space Probability Distribution HMM , 2002 .

[12] M. Saunders,et al. Solution of Sparse Indefinite Systems of Linear Equations , 1975 .

[13] Ren-Hua Wang,et al. Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge , 2008, INTERSPEECH.

[14] Edward I. George,et al. Bayesian Ensemble Learning , 2006, NIPS.

[15] J. Friedman. Stochastic gradient boosting , 2002 .

[16] Frank K. Soong,et al. Generating natural F0 trajectory with additive trees , 2008, INTERSPEECH.

[17] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..