The goal of this study was to conduct modelling experiments, the purpose of which was the expression of three basic emotions (joy, sadness and anger) in Estonian parametric text-to-speech synthesis on the basis of both a male and a female voice. For each emotion, three different test models were constructed and presented for evaluation to subjects in perception tests. The test models were based on the basic emotions’ characteristic parameter values that had been determined on the basis of human speech. In synthetic speech, the test subjects most accurately recognized the emotion of sadness, and least accurately the emotion of joy. The results of the test showed that, in the case of the synthesized male voice, the model with enhanced parameter values performed best for all three emotions, whereas in the case of the synthetic female voice, different emotions called for different models: the model with decreased values was the most suitable one for the expression of joy, and the model with enhanced values was the most suitable for the expression of sadness and anger. Logistic regression was applied to the results of the perception tests in order to determine the significance and contribution of each acoustic parameter in the emotion models, and the possible need to adjust the values of the parameters. Kokkuvote. Kairi Tamuri ja Meelis Mihkla: Pohiemotsioonide valjendusvoimalused eestikeelsel parameetrilisel konesunteesil. Uurimistoo eesmark oli labi viia modelleerimiseksperimente kolme pohiemotsiooni (roomu, kurbuse ja viha) valjendamiseks eestikeelsel parameetrilisel konesunteesil nii mees- kui ka naissunteeshaale baasil. Selleks koostati iga emotsiooni kohta kolm erinevat katsemudelit, mida lasti katseisikutel tajutestidel hinnata. Katsemudelite aluseks oli inimkone pohjal maaratud pohiemotsioonidele omased parameetrite vaartused. Emotsioonidest tunti sunteeskones koige paremini ara kurbuse-emotsioon ning koige halvemini roomu-emotsioon. Testitulemused naitasid, et kui meessunteeshaale puhul tootas koigi kolme emotsiooni puhul koige paremini voimendatud vaartuste mudel, siis naissunteeshaale puhul vajasid erinevad emotsioonid erinevaid mudeleid: roomu valjendamiseks sobis koige paremini vahendatud vaartuste mudel, kurbuse ja viha valjendamiseks voimendatud vaartuste mudel. Tajutestide tulemusi analuusiti logistilisel regressioonil, et teha kindlaks uksikute akustiliste parameetrite olulisus ja osakaal emotsiooni mudelites ning parameetrite vaartuste korrigeerimisvajadused. Marksonad : eesti keel, emotsioonid, konesuntees, akustiline mudel, konetempo, intensiivsus, pohitoon
[2]
P. Ekman.
Are there basic emotions?
,
1992,
Psychological review.
[3]
Albert Rilliard,et al.
The prosodic dimensions of emotion in speech: the relative weights of parameters
,
2005,
INTERSPEECH.
[4]
Kairi Tamuri.
Kas formandid peegeldavad emotsioone
,
2012
.
[5]
Kairi Tamuri.
Intensity of Estonian Emotional Speech
,
2012,
Baltic HLT.
[6]
Takao Kobayashi,et al.
Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis
,
2005,
IEICE Trans. Inf. Syst..
[7]
Klaus R. Scherer,et al.
Vocal markers of emotion: Comparing induction and acting elicitation
,
2013,
Comput. Speech Lang..
[8]
Francesc Alías,et al.
Modeling and Synthesizing Emotional Speech for Catalan Text-to-Speech Synthesis
,
2004,
ADS.
[9]
Silke Paulmann,et al.
Cross-cultural emotional prosody recognition: Evidence from Chinese and British listeners
,
2014,
Cognition & emotion.
[10]
Heiga Zen,et al.
The HMM-based speech synthesis system (HTS) version 2.0
,
2007,
SSW.
[11]
Kairi Tamuri.
Fundamental frequency in Estonian emotional read-out speech
,
2014
.
[12]
Rene Altrov.
ASPECTS OF CULTURAL COMMUNICATION IN RECOGNIZING EMOTIONS
,
2013
.
[13]
Junichi Yamagishi,et al.
Emotion transplantation through adaptation in HMM-based speech synthesis
,
2015,
Comput. Speech Lang..