Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis

This paper describes an MLLR-based speaking style adaptation technique for HMM-based speech synthesis. Since speaking styles and emotional expressions are characterized by many suprasegmental features as well as segmental features, it is necessary to adapt suprasegmental features for speaking style adaptation. To achieve suprasegmental feature adaptation, we utilize context clustering decision trees, which are constructed in the training stage, for tying of regression matrices. Using this technique, we adapt an initial "reading" style model to "joyful" or "sad" styles. Experimental results show that, using 50 adaptation sentences, speech samples generated from adapted models were judged to be similar to the target speaking styles at rates of 92% and 70% for joyful and sad styles, respectively.

[1]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[2]  Keiichi Tokuda,et al.  Text-to-speech synthesis with arbitrary speaker's voice from average voice , 2001, INTERSPEECH.

[3]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[4]  Keiichi Tokuda,et al.  Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Koichi Shinoda,et al.  MDL-based context-dependent subword modeling for speech recognition , 2000 .

[7]  Takao Kobayashi,et al.  Modeling of various speaking styles and emotions for HMM-based speech synthesis , 2003, INTERSPEECH.

[8]  Keiichi Tokuda,et al.  Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).