论文信息 - Speaker-Independent HMM-based Speech Synthesis System: HTS-2007 System for the Blizzard Challenge 2007

Speaker-Independent HMM-based Speech Synthesis System: HTS-2007 System for the Blizzard Challenge 2007

This paper describes an HMM-based speech synthesis system developed by the HTS working group for the Blizzard Challenge 2007. To further explore the potential of HMM-based speech synthesis, we incorporate new features in our conventional system which underpin a speaker-independent approach: speaker adaptation techniques; adaptive training for HSMMs; and full covariance modeling using the CSMAPLR transforms.

[1] Vassilios Digalakis,et al. Speaker adaptation using combined transformation and Bayesian methods , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2] Keiichi Tokuda,et al. Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3] Takao Kobayashi,et al. Robust F0 Estimation of Speech Signal Using Harmonicity Measure Based on Instantaneous Frequency , 2004, IEICE Trans. Inf. Syst..

[4] Ramesh A. Gopinath,et al. Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5] Philip C. Woodland,et al. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[6] Roy D. Patterson,et al. Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[7] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[8] Heiga Zen,et al. The Nitech-NAIST HMM-Based Speech Synthesis System for the Blizzard Challenge 2006 , 2006, IEICE Trans. Inf. Syst..

[9] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[10] Sadaoki Furui,et al. New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer , 2006, Speech Commun..

[11] K. Tokuda,et al. A Training Method of Average Voice Model for HMM-Based Speech Synthesis , 2003, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[12] Takao Kobayashi,et al. Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training , 2007, IEICE Trans. Inf. Syst..

[13] B. Efron,et al. Stein's Paradox in Statistics , 1977 .

[14] Chin-Hui Lee,et al. A structural Bayes approach to speaker adaptation , 2001, IEEE Trans. Speech Audio Process..

[15] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..

[16] Mark J. F. Gales,et al. Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[17] Keiichi Tokuda,et al. Speaker adaptation for HMM-based speech synthesis system using MLLR , 1998, SSW.

[18] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19] Stephen E. Levinson,et al. Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[20] Bradley P. Carlin,et al. BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[21] Heiga Zen,et al. Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .

[22] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[23] Keiichi Tokuda,et al. Multi-Space Probability Distribution HMM , 2002 .

[24] Koichi Shinoda,et al. MDL-based context-dependent subword modeling for speech recognition , 2000 .

[25] Bradley P. Carlin,et al. BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[26] Heiga Zen,et al. Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV , 2007, SSW.

[27] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[28] Takao Kobayashi,et al. Constrained structural maximum a posteriori linear regression for average-voice-based speech synthesis , 2006, INTERSPEECH.

[29] Chin-Hui Lee,et al. Structural maximum a posteriori linear regression for fast HMM adaptation , 2002, Comput. Speech Lang..

[30] H. Zen,et al. An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[31] Takao Kobayashi,et al. Acoustic model training based on linear transformation and MAP modification for HSMM-based speech synthesis , 2006, INTERSPEECH.