Eyes Whisper Depression: A CCA based Multimodal Approach

This paper presents our work on ACM MM Audio Visual Emotion Corpus 2013 (AVEC 2013) depression recognition sub-challenge using the baseline features in accordance with the challenge protocol. We use Canonical Correlation Analysis for audio-visual fusion as well as covariate extraction for the target task. The video baseline provides histograms of local phase quantization features extracted from 4x4=16 regions of the detected face. We summarize the video features over segments of length 20 seconds using mode and range functionals. We observe that features of range functional that measure the variance tendency provides statistically significantly higher canonical correlation than mode functional features that measure the mean tendency. Moreover, when audio-visual features are used with varying number of covariates per region, the regions that were consistently found the best are the ones corresponding to two eyes and the right part of the mouth.

[1]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[2]  Björn W. Schuller,et al.  AVEC 2012: the continuous audio/visual emotion challenge , 2012, ICMI '12.

[3]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[4]  Björn W. Schuller,et al.  CCA based feature selection with application to continuous depression recognition from acoustic speech features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[6]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[7]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[8]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[9]  Fikret S. Gürgen,et al.  Ensemble canonical correlation analysis , 2013, Applied Intelligence.

[10]  A. Beck,et al.  Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. , 1996, Journal of personality assessment.

[11]  Roland Göcke,et al.  Diagnosis of depression by behavioural signals: a multimodal approach , 2013, AVEC@ACM Multimedia.

[12]  Vidhyasaharan Sethu,et al.  Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Roddy Cowie,et al.  AVEC 2012: the continuous audio/visual emotion challenge - an introduction , 2012, ICMI.