Ensemble CCA for Continuous Emotion Prediction

This paper presents our work on ACM MM Audio Visual Emotion Corpus 2014 (AVEC 2014) using the baseline features in accordance with the challenge protocol. For prediction, we use Canonical Correlation Analysis (CCA) in affect sub-challenge (ASC) and Moore-Penrose generalized inverse (MPGI) in depression sub-challenge (DSC). The video baseline provides histograms of Local Gabor Binary Patterns from Three Orthogonal Planes (LGBP-TOP) features. Based on our preliminary experiments on AVEC 2013 challenge data, we focus on the inner facial regions that correspond to eyes and mouth area. We obtain an ensemble of regional linear regressors via CCA and MPGI. We also enrich the 2014 baseline set with Local Phase Quantization (LPQ) features extracted using Intraface toolkit detected/tracked faces. Combining both representations in a CCA ensemble approach, on the challenge test set we reach an average Pearson's Correlation Coefficient (PCC) of 0.3932, outperforming the ASC test set baseline PCC of 0.1966. On the DSC, combining modality specific MPGI based ensemble systems, we reach 9.61 Root Mean Square Error (RMSE).

[1]  Fabien Ringeval,et al.  The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load , 2014, INTERSPEECH.

[2]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[3]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Victor C. M. Leung,et al.  Extreme Learning Machines [Trends & Controversies] , 2013, IEEE Intelligent Systems.

[6]  Hongming Zhou,et al.  Extreme Learning Machines [Trends & Controversies] , 2013 .

[7]  김용수,et al.  Extreme Learning Machine 기반 퍼지 패턴 분류기 설계 , 2015 .

[8]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[9]  Enrique Argones-Rúa,et al.  Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex , 2013, AVEC@ACM Multimedia.

[10]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[11]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[12]  Björn W. Schuller,et al.  Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[15]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[16]  Albert Ali Salah,et al.  Eyes Whisper Depression: A CCA based Multimodal Approach , 2014, ACM Multimedia.

[17]  Michel F. Valstar,et al.  Local Gabor Binary Patterns from Three Orthogonal Planes for Automatic Facial Expression Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[18]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Vladimir Pavlovic,et al.  Central Subspace Dimensionality Reduction Using Covariance Operators , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[21]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[22]  Shaogang Gong,et al.  Beyond Facial Expressions: Learning Human Emotion from Body Gestures , 2007, BMVC.

[23]  Roddy Cowie,et al.  AVEC 2012: the continuous audio/visual emotion challenge - an introduction , 2012, ICMI.

[24]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[25]  C. Pelachaud,et al.  Emotion-Oriented Systems: The Humaine Handbook , 2011 .

[26]  Fikret S. Gürgen,et al.  A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy-Maximum Relevance filter method , 2012, Expert Syst. Appl..

[27]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Shaogang Gong,et al.  Capturing Correlations Among Facial Parts for Facial Expression Analysis , 2007, BMVC.

[29]  M. Bartlett Further aspects of the theory of multiple regression , 1938, Mathematical Proceedings of the Cambridge Philosophical Society.

[30]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[31]  Maja Pantic,et al.  Action unit detection using sparse appearance descriptors in space-time video volumes , 2011, Face and Gesture 2011.

[32]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[33]  Fikret S. Gürgen,et al.  Ensemble canonical correlation analysis , 2013, Applied Intelligence.

[34]  A. Beck,et al.  Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. , 1996, Journal of personality assessment.

[35]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[36]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[37]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[38]  Stefanos Zafeiriou,et al.  Correlated-spaces regression for learning continuous emotion dimensions , 2013, MM '13.

[39]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[40]  Björn W. Schuller,et al.  CCA based feature selection with application to continuous depression recognition from acoustic speech features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Albert Ali Salah,et al.  Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction , 2014, INTERSPEECH.