Speech Emotion Recognition Using MFCCs Extracted from a Mobile Terminal based on ETSI Front End

The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interface in man-machine applications. The paper presents a system for the recognition of emotional states based on parameters extracted at the front end of a mobile terminal according to the ETSI ES 202 050 standard. Starting from a vector of various features derived from energy and MFCCs, an approach based on genetic algorithms is used to determine a subset of features that will allow robust speech classification of 7 emotional states: anger, joy, sadness, fear, disgust, boredom and neutral

[1]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[2]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[4]  Say Wei Foo,et al.  Classification of stress in speech using linear and nonlinear features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[6]  Pierre-Yves Oudeyer,et al.  The production and recognition of emotions in speech: features and algorithms , 2003, Int. J. Hum. Comput. Stud..

[7]  L.C. De Silva,et al.  Speech based emotion classification , 2001, Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239).

[8]  John H. L. Hansen,et al.  Feature analysis and neural network-based classification of speech under stress , 1996, IEEE Trans. Speech Audio Process..

[9]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[10]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[11]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[12]  F. Beritelli,et al.  A Genetic Algorithm Feature Selection Approach to Robust Classification between "Positive" and "Negative" Emotional States in Speakers , 2005, Conference Record of the Thirty-Ninth Asilomar Conference onSignals, Systems and Computers, 2005..

[13]  Alessandra Russo,et al.  Classification of speech under stress using features selected by genetic algorithms , 2006, 2006 14th European Signal Processing Conference.

[14]  Ryohei Nakatsu,et al.  Emotion Recognition in Speech Using Neural Networks , 2000, Neural Computing & Applications.

[15]  Kwee-Bo Sim,et al.  Emotion recognition and acoustic analysis from speech signal , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[16]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[17]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[18]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[19]  Oudeyer Pierre-Yves,et al.  The production and recognition of emotions in speech: features and algorithms , 2003 .

[20]  Alessandra Russo,et al.  Multistyle classification of speech under stress using feature subset selection based on genetic algorithms , 2007, Speech Commun..