SVM-MLP-PNN Classifiers on Speech Emotion Recognition Field - A Comparative Study

In this paper, we present a comparative analysisof three classifiers for speech signal emotion recognition.Recognition was performed on emotional Berlin Database.This work focuses on speaker and utterance (phrase)dependent and independent framework. One hundred thirtythree (133) sound/speech features were extracted from Pitch,Mel Frequency Cepstral Coefficients, Energy and Formantsand were evaluated in order to create a feature set sufficient todiscriminate between seven emotions in acted speech. A set of26 features was selected by statistical method and MultilayerPercepton, Probabilistic Neural Networks and Support VectorMachine were used for the Emotion Classification at sevenclasses: anger, happiness, anxiety/fear, sadness, boredom,disgust and neutral. In speaker dependent framework,Probabilistic Neural Network classifier reached very highaccuracy of 94%, whereas in speaker independent framework,Support Vector Machine classification reached the bestaccuracy of 80%. The results of numerical experiments aregiven and discussed in the paper.

[1]  P. Lang The emotion probe. Studies of motivation and attention. , 1995, The American psychologist.

[2]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[3]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[4]  Jon Trinder,et al.  The Humane Interface: New Directions for Designing Interactive Systems , 2002, Interact. Learn. Environ..

[5]  Yasunari Yoshitomi,et al.  Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face , 2000, Proceedings 9th IEEE International Workshop on Robot and Human Interactive Communication. IEEE RO-MAN 2000 (Cat. No.00TH8499).

[6]  Benoit Huet,et al.  Bimodal Emotion Recognition , 2010, ICSR.

[7]  Panayiotis G. Georgiou,et al.  Real-time Emotion Detection System using Speech: Multi-modal Fusion of Different Timescale Features , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[8]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[9]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[10]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[11]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Zhigang Deng,et al.  Emotion recognition based on phoneme classes , 2004, INTERSPEECH.

[13]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[14]  L. C. De Silva,et al.  Bimodal emotion recognition , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[15]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[16]  Theodoros Iliou,et al.  Comparison of Different Classifiers for Emotion Recognition , 2009, 2009 13th Panhellenic Conference on Informatics.

[17]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[18]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[19]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[20]  Theodoros Kostoulas,et al.  Study on Speaker-Independent Emotion Recognition from Speech on Real-World Data , 2007, COST 2102 Workshop.

[21]  W. Abdulla,et al.  Improving speech recognition performance through gender separation , 1988 .

[22]  Klaus R. Scherer,et al.  The role of intonation in emotional expressions , 2005, Speech Commun..

[23]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[24]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[25]  Theodoros Iliou,et al.  Statistical Evaluation of Speech Features for Emotion Recognition , 2009, 2009 Fourth International Conference on Digital Telecommunications.

[26]  A. Stuart,et al.  Non-Parametric Statistics for the Behavioral Sciences. , 1957 .

[27]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[28]  Zdravko Kacic,et al.  Context-Independent Multilingual Emotion Recognition from Speech Signals , 2003, Int. J. Speech Technol..

[29]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[30]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .