Audio-visual feature selection and reduction for emotion classification

Recognition of expressed emotion from speech and facial gestures was investigated in experiments on an audio-visual emotional database. A total of 106 audio and 240 visual features were extracted and then features were selected with Plus l-Take Away r algorithm based on Bhattacharyya distance criterion. In the second step, linear transformation methods, principal component analysis (PCA) and linear discriminant analysis (LDA), were applied to the selected features and Gaussian classifiers were used for classification of emotions. The performance was higher for LDA features compared to PCA features. The visual features performed better than audio features, for both PCA and LDA. Across a range of fusion schemes, the audio-visual feature results were close to that of visual features. A highest recognition rate of 53% was achieved with audio features, 98% with visual features, and 98% with audio-visual features selected by Bhattacharyya distance and transformed by LDA.

[1]  Carlos Busso,et al.  Interrelation Between Speech and Facial Gestures in Emotional Utterances: A Single Subject Study , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yue-Kai Huang,et al.  Visual/Acoustic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[3]  Kornel Laskowski,et al.  Emotion recognition in spontaneous speech using GMMs , 2006, INTERSPEECH.

[4]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[5]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[6]  K. Scherer What are emotions? And how can they be measured? , 2005 .

[7]  Laurence Devillers,et al.  Detection of real-life emotions in call centers , 2005, INTERSPEECH.

[8]  Björn W. Schuller,et al.  Comparing one and two-stage acoustic modeling in the recognition of emotion in speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[9]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[10]  Jon Sánchez,et al.  Automatic emotion recognition using prosodic parameters , 2005, INTERSPEECH.

[11]  Lawrence M. Ward,et al.  Affective Quality Attributed to Environments , 1981 .

[12]  R. Cowie,et al.  A new emotion database: considerations, sources and scope , 2000 .

[13]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[14]  Lin-Shan Lee,et al.  Feature analysis for emotion recognition from Mandarin speech considering the special characteristics of Chinese language , 2006, INTERSPEECH.

[15]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[16]  Chun Chen,et al.  Audio-visual based emotion recognition using tripled hidden Markov model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[18]  Chi Hau Chen,et al.  Pattern recognition and signal processing , 1978 .

[19]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[20]  M. Borchert,et al.  Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[21]  A. Ortony,et al.  What's basic about basic emotions? , 1990, Psychological review.

[22]  David G. Stork,et al.  Pattern Classification , 1973 .

[23]  Constantine Kotropoulos,et al.  Emotional speech classification using Gaussian mixture models , 2005, 2005 IEEE International Symposium on Circuits and Systems.