Multimodal Emotion Recognition Integrating Affective Speech with Facial Expression

In recent years, emotion recognition has attracted extensive interest in signal processing, artificial intelligence and pattern recognition due to its potential applications to human-computer-interaction (HCI). Most previously published works in the field of emotion recognition devote to performing emotion recognition by using either affective speech or facial expression. However, Affective speech and facial expression are mainly two important ways of human emotion expression, as they are the most natural and efficient manners for human beings to communicate their emotions and intentions. In this paper, we aim to develop a multimodal emotion recognition system integrating affective speech with facial expression and investigate the performance of multimodal emotion recognition at the feature-level and at the decision-level. After extracting acoustic features and facial features related to human emotion expression, the popular support vector machines (SVM) classifier is employed to perform emotion classification. Experimental results on the benchmarking eNTERFACE’05 emotional database indicate that the given approach of multimodal emotion recognition integrating affective speech with facial expression obtains obviously superior performance to the single emotion recognition approach, i.e., speech emotion recognition or facial expression recognition. The best performance obtained by using the product rule at the decision-level fusion is up to 67.44%. . Key-Words: Mutimodal emotion recognition, affective speech, facial expression, support vector machines, speech emotion recognition, facial expression recognition

[1]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[2]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[3]  Hideki Kasuya,et al.  Novel acoustic measurements of jitter and shimmer characteristics from pathological voice , 1993, EUROSPEECH.

[4]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[5]  Anastasios Tefas,et al.  Salient feature and reliable classifier selection for facial expression classification , 2010, Pattern Recognit..

[6]  Mann Oo. Hay Emotion recognition in human-computer interaction , 2012 .

[7]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[8]  K. Scherer,et al.  Vocal expression and communication of emotion. , 1993 .

[9]  Y. V. Venkatesh,et al.  Facial expression recognition using radial encoding of local Gabor features and classifier synthesis , 2012, Pattern Recognit..

[10]  Shiqing Zhang,et al.  Speech Emotion Recognition Using an Enhanced Kernel Isomap for Human-Robot Interaction , 2013 .

[11]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[12]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[13]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[14]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  D Michaelis,et al.  Selection and combination of acoustic features for the description of pathologic voices. , 1998, The Journal of the Acoustical Society of America.

[16]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[17]  Ibrahiem M. M. El Emary,et al.  Speech emotion recognition approaches in human computer interaction , 2013, Telecommun. Syst..

[18]  Roddy Cowie,et al.  Automatic recognition of emotion from voice: a rough benchmark , 2000 .

[19]  Valery A. Petrushin,et al.  Emotion recognition in speech signal: experimental study, development, and application , 2000, INTERSPEECH.

[20]  Alex Waibel,et al.  EMOTION-SENSITIVE HUMAN-COMPUTER INTERFACES , 2000 .

[21]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[23]  C. Theekapun,et al.  Facial Expression Recognition Based on , 2008 .

[24]  Shaogang Gong,et al.  Robust facial expression recognition using local binary patterns , 2005, IEEE International Conference on Image Processing 2005.

[25]  Gudrun Klasmeyer,et al.  The perceptual importance of selected voice quality parameters , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  S. V. Dudul,et al.  Human emotion recognition system using optimally designed SVM with different facial feature extraction techniques , 2008 .

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  Richard Bowden,et al.  Local binary patterns for multi-view facial expression recognition , 2011 .

[29]  R. L. Trask,et al.  语音学和音系学词典 = A dictionary of phonetics and phonology , 1993 .

[30]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  K. Scherer,et al.  Effect of experimentally induced stress on vocal parameters. , 1986, Journal of experimental psychology. Human perception and performance.

[33]  Shiqing Zhang,et al.  Facial expression recognition based on local binary patterns and local fisher discriminant analysis , 2012 .

[34]  A. Friederici,et al.  Accentuation and emotions - two different systems? , 2000 .

[35]  Takeo Kanade,et al.  Facial Expression Recognition , 2011, Handbook of Face Recognition.

[36]  Shiqing Zhang,et al.  Facial Expression Recognition Using Sparse Representation , 2012 .

[37]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.