A hierarchical static-dynamic framework for emotion classification

The goal of emotion classification is to estimate an emotion label, given representative data and discriminative features. Humans are very good at deriving high-level representations of emotion state and integrating this information over time to arrive at a final judgment. However, currently, most emotion classification algorithms do not use this technique. This paper presents a hierarchical static-dynamic emotion classification framework that estimates high-level emotional judgments and locally integrates this information over time to arrive at a final estimate of the affective label. The results suggest that this framework for emotion classification leads to more accurate results than either purely static or purely dynamic strategies.

[1]  Maja J. Mataric,et al.  Robust representations for out-of-domain emotions using Emotion Profiles , 2010, 2010 IEEE Spoken Language Technology Workshop.

[2]  Nicu Sebe,et al.  Facial expression recognition from video sequences , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[3]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[4]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[5]  Qi Tian,et al.  Feature selection using principal feature analysis , 2007, ACM Multimedia.

[6]  Carlos Busso,et al.  Using neutral speech models for emotional speech analysis , 2007, INTERSPEECH.

[7]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[8]  Gwen Littlewort,et al.  Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[10]  Roddy Cowie,et al.  Emotion Recognition and Synthesis Based on MPEG‐4 FAPs , 2002 .

[11]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[12]  Gang Wei,et al.  Speech emotion recognition based on HMM and SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[13]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[14]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Björn W. Schuller,et al.  Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm , 2010, INTERSPEECH.