Constructing emotional speech synthesizers with limited speech database

This paper describes an emotional speech synthesis system based on HMMs and related modeling techniques. For concatenative speech synthesis, we require all of the concatenation units that will be used to be recorded beforehand and made available at synthesis time. To adopt this approach for synthesizing the wide variety of human emotions possible in speech, implies that this process should be repeated for every targeted emotion making this task challenging and time consuming. In this paper, we propose an emotional speech synthesis technique based on HMMs, especially for the case where only limited amount of training data is available, directly incorporating subjective evaluation results performed on the training data. Listening results performed on the synthesized speech suggest that the proposed technique helps to improve the emotional content of synthesized speech.

[1]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Keiichi Tokuda,et al.  Speaker interpolation in HMM-based speech synthesis system , 1997, EUROSPEECH.

[3]  Keiichi Tokuda,et al.  Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Keiichi Tokuda,et al.  Duration modeling for HMM-based speech synthesis , 1998, ICSLP.

[5]  Koichi Shinoda,et al.  Speaker adaptation with autonomous model complexity control by MDL principle , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Keiichi Tokuda,et al.  Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  H. Zen,et al.  An HMM-based speech synthesis system applied to English , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[8]  Keiichi Tokuda,et al.  Eigenvoices for HMM-based speech synthesis , 2002, INTERSPEECH.

[9]  Nick Campbell,et al.  A Speech Synthesis System with Emotion for Assisting Communication , 2000 .

[10]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Nick Campbell,et al.  DATABASES OF EMOTIONAL SPEECH , 2000 .

[12]  E. Eide Preservation, identification, and use of emotion in a text-to-speech system , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[13]  Shrikanth S. Narayanan,et al.  Expressive speech synthesis using a concatenative synthesizer , 2002, INTERSPEECH.

[14]  Heiga Zen,et al.  AN HMM-BASED SPEECH SYNTHESIS SYSTEM APPLIED TO ENGLISH , 2003 .

[15]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[16]  Shrikanth Narayanan,et al.  Limited domain synthesis of expressive military speech for animated characters , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..