Emotion recognition from audio-visual data using rule based decision level fusion

Emotion recognition systems aim at identifying emotions of human subjects from underlying data with acceptable accuracy. Audio and visual signals, being the primary modalities of human emotion perception, have attained the most attention in developing intelligent systems for natural interaction. The emotion recognition system must automatically identify the human emotional states from his or her voice and facial image, unaffected by all possible constraints. In this work, an audio-visual emotion recognition system has been developed that uses fusion of both the modalities at the decision level. At first, separate emotion recognition systems that use speech and facial expressions were developed and tested separately. The speech emotion recognition system was tested on two standard speech emotion databases: Berlin EMODB database and Assamese database. The efficiency of visual emotion recognition system was analyzed using the eNTREFACE'05 database. Then a decision rule was set for fusion of both audio and visual information at the decision level to identify emotions. The proposed multi-modal system has been tested on the same eNTERFACE'05 database.

[1]  Carlos Busso,et al.  Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition , 2013, IEEE Transactions on Affective Computing.

[2]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[3]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[4]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Mohan M. Trivedi,et al.  Face Expression Recognition by Cross Modal Data Association , 2013, IEEE Transactions on Multimedia.

[7]  Björn W. Schuller,et al.  Hidden Markov model-based speech emotion recognition , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[8]  Jiucang Hao,et al.  Emotion recognition by speech signals , 2003, INTERSPEECH.

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[11]  Björn W. Schuller,et al.  Acoustic emotion recognition: A benchmark comparison of performances , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Chung-Hsien Wu,et al.  Two-Level Hierarchical Alignment for Semi-Coupled HMM-Based Audiovisual Emotion Recognition With Temporal Course , 2013, IEEE Transactions on Multimedia.

[13]  Nicu Sebe,et al.  MULTIMODAL EMOTION RECOGNITION , 2005 .

[14]  Loïc Kessous,et al.  Multimodal emotion recognition from expressive faces, body gestures and speech , 2007, AIAI.

[15]  Chung-Hsien Wu,et al.  Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[16]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[17]  Aurobinda Routray,et al.  Vocal emotion recognition in five native languages of Assam using new wavelet features , 2009, Int. J. Speech Technol..

[18]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[19]  Oh-Wook Kwon,et al.  EMOTION RECOGNITION BY SPEECH SIGNAL , 2003 .