A Novel Multimodal Emotion Recognition Approach for Affective Human Robot Interaction

Facial expressions and speech are elements that provide emotional information about the user through multiple communication channels. In this paper, a novel multimodal emotion recognition system based on visual and auditory information processing is proposed. The proposed approach is used in real affective human robot communication in order to estimate five different emotional states (i.e., happiness, anger, fear, sadness and neutral), and it consists of two subsystems with similar structure. The first subsystem achieves a robust facial feature extraction based on consecutively applied filters to the edge image and the use of a Dynamic Bayessian Classifier. A similar classifier is used in the second subsystem, where the input is associated to a set of speech descriptors, such as speech-rate, energy and pitch. Both subsystems are finally combined in real time. The results of this multimodal approach show the robustness and accuracy of the methodology respect to single emotion recognition systems.

[1]  Roddy Cowie,et al.  Describing the emotional states that are expressed in speech , 2003, Speech Commun..

[2]  Colin Grubb Multimodal Emotion Recognition , 2013 .

[3]  Luis J. Manso,et al.  RoboComp: A Tool-Based Robotics Framework , 2010, SIMPAR.

[4]  Pablo Bustos,et al.  Muecas: A Multi-Sensor Robotic Head for Affective Human Robot Interaction and Imitation , 2014, Sensors.

[5]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[6]  D.P. Skinner,et al.  The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.

[7]  Wenwu Wang,et al.  Machine Audition: Principles, Algorithms and Systems , 2010 .

[8]  Tsuhan Chen,et al.  Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[9]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Vinay Bettadapura,et al.  Face Expression Recognition and Analysis: The State of the Art , 2012, ArXiv.

[11]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[12]  Nicu Sebe,et al.  Multimodal approaches for emotion recognition: a survey , 2005, IS&T/SPIE Electronic Imaging.

[13]  Björn W. Schuller,et al.  Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[16]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[17]  Loïc Kessous,et al.  Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis , 2010, Journal on Multimodal User Interfaces.

[18]  Nicolás F. Lori,et al.  Visuo-auditory Multimodal Emotional Structure to Improve Human-Robot-Interaction , 2012, Int. J. Soc. Robotics.

[19]  P. Jackson,et al.  Multimodal Emotion Recognition , 2010 .