Assessing the naturalness of malay emotional voice corpora

This research reports the development and evaluation of Malay emotional voice corpora through listening evaluation, and how the numbers of emotion choices offered to evaluators affect the result of the evaluation. The voice corpora comprises of three emotions, namely anger, sadness and happiness being expressed by two male and two female actors. The voice corpora were evaluated in two separate listening tests involving a number of Malay native evaluators balanced for gender, age and profession. In the first listening test, evaluators were given twenty five choices of emotions to choose from. For the second test, the number of emotion choices is only five. Each test was conducted separately with different group of evaluators. The results of the two tests are grossly different with the emotion identification rate of the first test lower than the second test.

[1]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[2]  W. James,et al.  What Is an Emotion , 1977 .

[3]  Heiga Zen,et al.  Unsupervised adaptation for HMM-based speech synthesis , 2008, INTERSPEECH.

[4]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[5]  P. Ekman What emotion categories or dimensions can observers judge from facial behavior , 1982 .

[6]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[7]  Heiga Zen,et al.  Constructing emotional speech synthesizers with limited speech database , 2004, INTERSPEECH.

[8]  Pc,et al.  An Introduction to Social Psychology , 1920, Nature.

[9]  Inma Hernáez,et al.  An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Simon King,et al.  Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech , 2010, Speech Commun..

[11]  Sheldon B. Michaels,et al.  Some Aspects of Fundamental Frequency and Envelope Amplitude as Related to the Emotional Content of Speech , 1962 .

[12]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[13]  Siew Hock Ow Building a Unit Selection Speech Synthesiser for Malay Language Using FESTVOX and Hidden Markov Model Toolkit (HTK) , 2007 .

[14]  Takao Kobayashi,et al.  Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis , 2008, Speech Commun..

[15]  W. Sendlmeier,et al.  Verification of acoustical correlates of emotional speech using formant-synthesis , 2000 .

[16]  Nick Campbell,et al.  A corpus-based speech synthesis system with emotion , 2003, Speech Commun..

[17]  Takao Kobayashi,et al.  Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-Based Speech Synthesis , 2005, IEICE Trans. Inf. Syst..

[18]  J. Turner Human Emotions: A Sociological Theory , 2007 .