论文信息 - Unsupervised adaptation for HMM-based speech synthesis

Unsupervised adaptation for HMM-based speech synthesis

It is now possible to synthesise speech using HMMs with a comparable quality to unit-selection techniques. Generating speech from a model has many potential advantages over concatenating waveforms. The most exciting is model adaptation. It has been shown that supervised speaker adaptation can yield highquality synthetic voices with an order of magnitude less data than required to train a speaker-dependent model or to build a basic unit-selection system. Such supervised methods require labelled adaptation data for the target speaker. In this paper, we introduce a method capable of unsupervised adaptation, using only speech from the target speaker without any labelling. Index Terms: speech synthesis, HMM-based speech synthesis, HTS, trajectory HMMs, speaker adaptation, MLLR

[1] Simon King,et al. The Blizzard Challenge 2007 , 2007 .

[2] Takao Kobayashi,et al. Average-Voice-Based Speech Synthesis Using HSMM-Based Speaker Adaptation and Adaptive Training , 2007, IEICE Trans. Inf. Syst..

[3] Simon King,et al. Statistical analysis of the Blizzard Challenge 2007 listening test results , 2007 .

[4] Takao Kobayashi,et al. Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Simon King,et al. Proc. Blizzard 2007 (in Proc. Sixth ISCA Workshop on Speech Synthesis) , 2007 .

[6] Heiga Zen,et al. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..

[7] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.

[8] Heiga Zen,et al. Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..