Multimodal approaches for emotion recognition: a survey

Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This paper explores new ways of human-computer interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and physiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.

[1]  W. Cannon The James-Lange theory of emotions: a critical examination and an alternative theory. By Walter B. Cannon, 1927. , 1927, The American journal of psychology.

[2]  Jun Ohya,et al.  Recognizing multiple persons' facial expressions using HMM based on automatic extraction of significant frames from image sequences , 1997, Proceedings of International Conference on Image Processing.

[3]  R. Krauss,et al.  Facial and autonomic manifestations of the dimensional structure of emotion , 1984 .

[4]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  D. Spalding The Principles of Psychology , 1873, Nature.

[6]  L. Rothkrantz,et al.  Toward an affect-sensitive multimodal human-computer interaction , 2003, Proc. IEEE.

[7]  P. Ekman,et al.  Strong evidence for universals in facial expressions: a reply to Russell's mistaken critique. , 1994, Psychological bulletin.

[8]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[9]  Kenji Mase,et al.  Recognition of Facial Expression from Optical Flow , 1991 .

[10]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[11]  P. Ekman Emotion in the human face , 1982 .

[12]  Clifford Nass,et al.  The media equation - how people treat computers, television, and new media like real people and places , 1996 .

[13]  Rosalind W. Picard Affective Computing , 1997 .

[14]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[15]  Larry S. Davis,et al.  Recognizing Human Facial Expressions From Long Image Sequences Using Optical Flow , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  S. Schachter The Interaction of Cognitive and Physiological Determinants of Emotional State , 1964 .

[17]  Thomas S. Huang,et al.  Exploiting the dependencies in information fusion , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[18]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[19]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[20]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Nicu Sebe,et al.  Facial expression recognition from video sequences , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[22]  J. Cacioppo,et al.  Inferring psychological significance from physiological signals. , 1990, The American psychologist.

[23]  Dariu Gavrila,et al.  Looking at people , 2007, AVSS.

[24]  Klaus R. Scherer,et al.  Adding the affective dimension: a new look in speech analysis and synthesis , 1996, ICSLP.

[25]  Tsutomu Miyasato,et al.  Emotion recognition from audiovisual information , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[26]  Lawrence S. Chen,et al.  Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction , 2000 .

[27]  Iain R. Murray,et al.  Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. , 1993, The Journal of the Acoustical Society of America.

[28]  Thomas S. Huang,et al.  Emotional expressions in audiovisual human computer interaction , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[29]  J. N. Bassili Emotion recognition: the role of facial movement and the relative importance of upper and lower areas of the face. , 1979, Journal of personality and social psychology.

[30]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[31]  Ben Shneiderman,et al.  Human values and the future of technology: a declaration of responsibility , 1991, SGCH.

[32]  Ben Shneiderman,et al.  Human values and the future of technology: a declaration of responsibility , 1999, CSOC.

[33]  Vladimir Pavlovic,et al.  Boosted learning in dynamic Bayesian networks for multimodal speaker detection , 2003, Proc. IEEE.

[34]  Valery A. Petrushin,et al.  How well can People and Computers Recognize Emotions in Speech , 1998 .

[35]  Nicu Sebe,et al.  Authentic facial expression analysis , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[36]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[37]  Timothy F. Cootes,et al.  A unified approach to coding and interpreting face images , 1995, Proceedings of IEEE International Conference on Computer Vision.

[38]  Shigeo Morishima,et al.  Expression analysis/synthesis system based on emotion space constructed by multilayered neural network , 1994 .

[39]  Jennifer Healey,et al.  Toward Machine Emotional Intelligence: Analysis of Affective Physiological State , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[41]  John L. Arnott,et al.  Synthesizing emotions in speech: is it time to get excited? , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.