Spoken Emotion Classification Using ToBI Features and GMM

This study investigated the usefulness of ToBI marks in determining the emotional state conveyed in speech. The Gaussian mixture model GMM used was as the classifier structure. A total of three different classification systems were developed based on the use of three different feature vectors. They were: (a) the classical approach that used signal pitch and energy features; (b) a ToBI-only feature based on tone and break tiers; and (c) a system that used the features of both (a) and (b). In ToBI, tone tier elements were automatically determined using pitch information. Three emotional states were investigated: happiness, anger, and sadness. The overall success rate achieved for the combined system was between 75% and 100%. This work indicated that the ToBI features alone were very useful for the classification of emotion, and detection improves when classical features are used in conjunction with ToBI.

[1]  R. Stibbard AUTOMATED EXTRACTION OF ToBI ANNOTATION DATA FROM THE READING / LEEDS EMOTIONAL SPEECH CORPUS , 2000 .

[2]  Lori Lamel,et al.  Challenges in real-life emotion annotation and machine learning based detection , 2005, Neural Networks.

[3]  Ling Guan,et al.  An investigation of speech-based human emotion recognition , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  John H. L. Hansen,et al.  The Impact of Speech Under `Stress''on Military Speech Technology , 2000 .

[6]  Valérie Maffiolo,et al.  A study on the automatic detection and characterization of emotion in a voice service context , 2005, INTERSPEECH.

[7]  Gregor Möhler,et al.  Rules for the generation of ToBI-based American English intonation , 1999, Speech Commun..

[8]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9]  Frederic D. McKenzie,et al.  Nonverbal indicators of malicious intent: affective components for interrogative virtual reality training , 2003, Int. J. Hum. Comput. Stud..

[10]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[11]  Michael S. Scordilis,et al.  Analysis, enhancement and evaluation of five pitch determination techniques , 2002, Speech Commun..

[12]  G. Ayers,et al.  Guidelines for ToBI labelling , 1994 .

[13]  S. Dandapat,et al.  Classification of Stressed Speech using Gaussian Mixture Model , 2005, 2005 Annual IEEE India Conference - Indicon.

[14]  Dominique Genoud,et al.  An overview of the CAVE project research activities in speaker verification , 2000, Speech Commun..