Multimodal Affective Analysis combining Regularized Linear Regression and Boosted Regression Trees

In this paper we present a multimodal approach for affective analysis that exploits features from video, audio, Electrocardiogram (ECG), and Electrodermal Activity (EDA) combining two regression techniques, namely Boosted Regression Trees and Linear Regression. Moreover, we propose a novel regularization approach for the Linear Regression in order to exploit the temporal correlation of the affective dimensions. The final prediction is obtained using a decision level fusion of the regressors individually trained on the different groups of features. The promising results obtained on the benchmark dataset show the efficacy and effectiveness of the proposed approach.

[1]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[2]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent in Function Space , 2007 .

[3]  Tao Qin,et al.  Global Ranking Using Continuous Conditional Random Fields , 2008, NIPS.

[4]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[6]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[8]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[9]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[10]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[11]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Behjat Siddiquie,et al.  Affect analysis in natural human interaction using Joint Hidden Conditional Random Fields , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[13]  Tamás D. Gedeon,et al.  Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol , 2014, ICMI.

[14]  Fabien Ringeval,et al.  AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data , 2015, AVEC@ACM Multimedia.

[15]  Louis-Philippe Morency,et al.  Modeling Latent Discriminative Dynamic of Multi-dimensional Affective Signals , 2011, ACII.

[16]  Thomas F. Coleman,et al.  An Interior Trust Region Approach for Nonlinear Minimization Subject to Bounds , 1993, SIAM J. Optim..

[17]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[18]  Shiguang Shan,et al.  Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild , 2014, ICMI.

[19]  Colleen Richey,et al.  Emotion detection in speech using deep networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[21]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[22]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[23]  Peter Robinson,et al.  Dimensional affect recognition using Continuous Conditional Random Fields , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[24]  Peng Song,et al.  A feature selection and feature fusion combination method for speaker-independent speech emotion recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Jean-Philippe Thiran,et al.  Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data , 2015, Pattern Recognit. Lett..