Combining Global and Local Convolutional 3D Networks for Detecting Depression from Facial Expressions

Deep learning architectures have been successfully applied in video-based health monitoring, to recognize distinctive variations in the facial appearance of subjects. To detect patterns of variation linked to depressive behavior, deep neural networks (NNs) typically exploit spatial and temporal information separately by, e.g., cascading a 2D convolutional NN (CNN) with a recurrent NN (RNN), although the intrinsic spatio-temporal relationships can deteriorate. With the recent advent of 3D CNNs like the convolutional 3D (C3D) network, these spatio-temporal relationships can be modeled to improve performance. However, the accuracy of C3D networks remain an issue when applied to depression detection. In this paper, the fusion of diverse C3D predictions are proposed to improve accuracy, where spatio-temporal features are extracted from global (full-face) and local (eyes) regions of subject. This allows to increasingly focus on a local facial region that is highly relevant for analyzing depression. Additionally, the proposed network integrates 3D Global Average Pooling in order to efficiently summarize spatio-temporal features without using fully-connected layers, and thereby reduce the number of model parameters and potential over-fitting. Experimental results on the Audio Visual Emotion Challenge (AVEC 2013 and AVEC 2014) depression datasets indicates that combining the responses of global and local C3D networks achieves a higher level of accuracy than state-of-the-art systems.

[1]  LinLin Shen,et al.  Human Behaviour-Based Automatic Depression Analysis Using Hand-Crafted Statistics and Deep Learned Spectral Features , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[2]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[4]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[5]  J. Piñol,et al.  The overdiagnosis of depression in non-depressed patients in primary care. , 2006, Family practice.

[6]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[7]  Michael Wagner,et al.  Multimodal assistive technologies for depression diagnosis and monitoring , 2013, Journal on Multimodal User Interfaces.

[8]  Manolis Tsiknakis,et al.  Automatic Assessment of Depression Based on Visual Cues: A Systematic Review , 2019, IEEE Transactions on Affective Computing.

[9]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Albert Ali Salah,et al.  Ensemble CCA for Continuous Emotion Prediction , 2014, AVEC '14.

[11]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[12]  Fan Zhang,et al.  Artificial Intelligent System for Automatic Depression Level Analysis Through Visual and Vocal Expressions , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[13]  Hongying Meng,et al.  Descriptive temporal template features for visual motion recognition , 2009, Pattern Recognit. Lett..

[14]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[15]  Guodong Guo,et al.  Automated Depression Diagnosis Based on Deep Networks to Encode Facial Appearance and Dynamics , 2018, IEEE Transactions on Affective Computing.

[16]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Xin Li,et al.  Automated Depression Diagnosis Based on Facial Dynamic Analysis and Sparse Coding , 2015, IEEE Transactions on Information Forensics and Security.

[18]  Abdenour Hadid,et al.  A Survey on Computer Vision for Assistive Medical Diagnosis From Faces , 2018, IEEE Journal of Biomedical and Health Informatics.

[19]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[20]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[21]  Sascha Meudt,et al.  Fusion of Audio-visual Features using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression , 2014, ICPRAM.

[22]  Dongmei Jiang,et al.  Multimodal Measurement of Depression Using Deep Learning Models , 2017, AVEC@ACM Multimedia.

[23]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  A. Bozorgmehr,et al.  What do the genetic association data say about the high risk of suicide in people with depression? A novel network-based approach to find common molecular basis for depression and suicidal behavior and related therapeutic targets. , 2018, Journal of affective disorders.

[25]  Roland Göcke,et al.  Diagnosis of depression by behavioural signals: a multimodal approach , 2013, AVEC@ACM Multimedia.

[26]  Michael Wagner,et al.  Multimodal Depression Detection: Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors , 2018, IEEE Transactions on Affective Computing.

[27]  Albert Ali Salah,et al.  Eyes Whisper Depression: A CCA based Multimodal Approach , 2014, ACM Multimedia.

[28]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[29]  Fan Zhang,et al.  Automatic Depression Scale Prediction using Facial Expression Dynamics and Regression , 2014, AVEC '14.

[30]  Yunhong Wang,et al.  DepAudioNet: An Efficient Deep Model for Audio based Depression Classification , 2016, AVEC@ACM Multimedia.

[31]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[32]  Hugo Jair Escalante,et al.  Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge , 2014, AVEC '14.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Michael Wagner,et al.  Eye movement analysis for depression detection , 2013, 2013 IEEE International Conference on Image Processing.

[35]  Eric Granger,et al.  Video-based face recognition using ensemble of haar-like deep convolutional neural networks , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[36]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Fan Yang,et al.  Video-based depression detection using local Curvelet binary patterns in pairwise orthogonal planes , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[38]  A. Mitchell,et al.  Clinical diagnosis of depression in primary care: a meta-analysis , 2009, The Lancet.

[39]  Ya Li,et al.  Multi task sequence learning for depression scale prediction from video , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[40]  Pia Rotshtein,et al.  Identification of Emotional Facial Expressions: Effects of Expression, Intensity, and Sex on Eye Gaze , 2016, PloS one.

[41]  Guodong Guo,et al.  Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features , 2018, IEEE Transactions on Affective Computing.

[42]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[43]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Michel F. Valstar,et al.  Local Gabor Binary Patterns from Three Orthogonal Planes for Automatic Facial Expression Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.