Affect analysis in natural human interaction using Joint Hidden Conditional Random Fields

We present a novel approach for multi-modal affect analysis in human interactions that is capable of integrating data from multiple modalities while also taking into account temporal dynamics. Our fusion approach, Joint Hidden Conditional Random Fields (JHRCFs), combines the advantages of purely feature level (early fusion) fusion approaches with late fusion (CRFs on individual modalities) to simultaneously learn the correlations between features from multiple modalities as well as their temporal dynamics. Our approach addresses major shortcomings of other fusion approaches such as the domination of other modalities by a single modality with early fusion and the loss of cross-modal information with late fusion. Extensive results on the AVEC 2011 dataset show that we outperform the state-of-the-art on the Audio Sub-Challenge, while achieving competitive performance on the Video Sub-Challenge and the Audiovisual Sub-Challenge.

[1]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[2]  Max Welling,et al.  Hidden-Unit Conditional Random Fields , 2011, AISTATS.

[3]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Louis-Philippe Morency,et al.  Modeling Latent Discriminative Dynamic of Multi-dimensional Affective Signals , 2011, ACII.

[6]  Mark A. Clements,et al.  Investigating the Use of Formant Based Features for Detection of Affective Dimensions in Speech , 2011, ACII.

[7]  Robert A. Sottilare,et al.  Using student mood and task performance to train classifier algorithms to select effective coaching strategies within intelligent tutoring systems (its) , 2009 .

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Bir Bhanu,et al.  A Psychologically-Inspired Match-Score Fusion Model for Video-Based Facial Expression Recognition , 2011, ACII.

[10]  Markus Kächele,et al.  Multiple Classifier Systems for the Classification of Audio-Visual Emotional States , 2011, ACII.

[11]  Daniel Jurafsky,et al.  Hidden Conditional Random Fields for phone recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.