Multimodal recognition of emotions in car environments

B Abstract — Within the last couple of years, automatic multimodal recognition of human emotions has gained a considerable interest from the research community. By taking into account more sources of information, the multimodal approaches allow for more reliable estimation of the human emotions. They increase the confidence of the results and decrease the level of ambiguity with respect to the emotions among the separate communication channels. This paper provides a thorough description of a bimodal emotion recognition system that uses face and speech analysis. Basically, we use hidden Markov models - HMMs to learn and to describe the temporal dynamics of the emotion clues in the visual and acoustic channels. We present the details of all steps involved in the analysis, from the preparation of the multimodal database and the feature extraction to the classification of six prototypic emotions. Apart from working with unimodal recognizers, we conduct experiments on both early fusion and decision level fusion of visual and audio features. The novelty of our approach consists of the dynamic modelling of emotions using hidden Markov models (HMMs) in combination with Local Binary Patterns (LBPs) (17) as visual features and mel-frequency cepstral coefficients (MFCCs) as audio features. In the same time, we propose a new method for visual feature selection based on the multi-class Adaboost.M2 classifier. A cross database method is employed to identify the set of most relevant features from a unimodal database and to proceed with applying it in the context of the multimodal setup. We report on the results we have achieved so far for the discussed models. The last part of the paper relates to conclusions and discussions on the possible ways to continue the research on the topic of multimodal emotion recognition.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[3]  Timothy F. Cootes,et al.  Interpreting face images using active appearance models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[4]  Stefanos D. Kollias,et al.  On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[5]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[6]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[7]  Franck Davoine,et al.  A solution for facial expression representation and recognition , 2002, Signal Process. Image Commun..

[8]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Dae-Jong Lee,et al.  Emotion recognition from the facial image and speech signal , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[10]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[11]  Chun Chen,et al.  Audio-visual based emotion recognition - a new approach , 2004, CVPR 2004.

[12]  Gerhard Rigoll,et al.  Bimodal fusion of emotional data in an automotive environment , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Ling Guan,et al.  Recognizing human emotion from audiovisual information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Nicu Sebe,et al.  Emotion Recognition Based on Joint Visual and Audio Cues , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[15]  Christine L. Lisetti,et al.  Toward multimodal fusion of affective cues , 2006, HCM '06.

[16]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[17]  Loïc Kessous,et al.  Modeling naturalistic affective states via facial and vocal expressions recognition , 2006, ICMI '06.

[18]  W. Minker,et al.  Combined Speech-Emotion Recognition for Spoken Human-Computer Interfaces , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[19]  Yuxiao Hu,et al.  Audio-Visual Spontaneous Emotion Recognition , 2007, Artifical Intelligence for Human Computing.

[20]  Björn W. Schuller,et al.  Low-Level Fusion of Audio, Video Feature for Multi-Modal Emotion Recognition , 2008, VISAPP.

[21]  Kwee-Bo Sim,et al.  Emotion Recognition Method Based on Multimodal Sensor Fusion Algorithm , 2008, Int. J. Fuzzy Log. Intell. Syst..

[22]  Nasrollah Moghaddam Charkari,et al.  Bimodal person-dependent emotion recognition comparison of feature level and decision level information fusion , 2008, PETRA '08.

[23]  Benoit Huet,et al.  Toward emotion indexing of multimedia excerpts , 2008, 2008 International Workshop on Content-Based Multimedia Indexing.

[24]  Kai-Tai Song,et al.  A New Information Fusion Method for Bimodal Robotic Emotion Recognition , 2008, J. Comput..