论文信息 - Informed blending of databases for emotional speech synthesis

Informed blending of databases for emotional speech synthesis

The goal of this project was to build a unit selection voice that could portray emotions with varying intensities. A suitable definition of an emotion was developed along with a descriptive framework that supported the work carried out. A single speaker was recorded portraying happy and angry speaking styles. Additionally a neutral database was also recorded. A target cost function was implemented that chose units according to emotion mark-up in the database. The Dictionary of Affect supported the emotional target cost function by providing an emotion rating for words in the target utterance. If a word was particularly ’emotional’, units from that emotion were favoured. In addition intensity could be varied which resulted in a bias to select a greater number emotional units. A perceptual evaluation was carried out and subjects were able to recognise reliably emotions with varying amounts of emotional units present in the target utterance.

Korin Richmond | Gregor Hofer | Robert A. J. Clark

[1] Brigitte Krenn,et al. RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA , 2004, ArXiv.

[2] J. Russell. A circumplex model of affect. , 1980 .

[3] Nick Campbell,et al. A corpus-based speech synthesis system with emotion , 2003, Speech Commun..

[4] K. Scherer,et al. Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[5] Simon King,et al. Festival 2 - build your own general purpose unit selection speech synthesiser , 2004, SSW.

[6] Nick Campbell,et al. ISCA special session: hot topics in speech synthesis , 2003, INTERSPEECH.

[7] Alan W. Black. Unit selection and emotional speech , 2003, INTERSPEECH.