The goal of this project was to build a unit selection voice that could portray emotions with varying intensities. A suitable definition of an emotion was developed along with a descriptive framework that supported the work carried out. A single speaker was recorded portraying happy and angry speaking styles. Additionally a neutral database was also recorded. A target cost function was implemented that chose units according to emotion mark-up in the database. The Dictionary of Affect supported the emotional target cost function by providing an emotion rating for words in the target utterance. If a word was particularly ’emotional’, units from that emotion were favoured. In addition intensity could be varied which resulted in a bias to select a greater number emotional units. A perceptual evaluation was carried out and subjects were able to recognise reliably emotions with varying amounts of emotional units present in the target utterance.
[1]
Brigitte Krenn,et al.
RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA
,
2004,
ArXiv.
[2]
J. Russell.
A circumplex model of affect.
,
1980
.
[3]
Nick Campbell,et al.
A corpus-based speech synthesis system with emotion
,
2003,
Speech Commun..
[4]
K. Scherer,et al.
Acoustic profiles in vocal emotion expression.
,
1996,
Journal of personality and social psychology.
[5]
Simon King,et al.
Festival 2 - build your own general purpose unit selection speech synthesiser
,
2004,
SSW.
[6]
Nick Campbell,et al.
ISCA special session: hot topics in speech synthesis
,
2003,
INTERSPEECH.
[7]
Alan W. Black.
Unit selection and emotional speech
,
2003,
INTERSPEECH.