Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features

As a serious mood disorder problem, depression causes severe symptoms that affect how people feel, think, and handle daily activities, such as sleeping, eating, or working. In this paper, a novel framework is proposed to estimate the Beck Depression Inventory II (BDI-II) values from video data, which uses a 3D convolutional neural network to automatically learn the spatiotemporal features at two different face scales. Then, a Recurrent Neural Network (RNN) is used to learn further from the sequence of the spatiotemporal information. This formulation, called RNN-C3D, can model the local and global spatiotemporal information from consecutive face expressions, in order to predict the depression levels. Experiments on the AVEC2013 and AVEC2014 depression datasets show that our proposed approach is promising, when compared to the state-of-the- art visual-based depression analysis methods.

[1]  Stefan Scherer,et al.  A Cross-modal Review of Indicators for Depression Detection Systems , 2017, CLPsych@ACL.

[2]  J. Mundt,et al.  Vocal Acoustic Biomarkers of Depression Severity and Treatment Response , 2012, Biological Psychiatry.

[3]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Mohammad H. Mahoor,et al.  Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses , 2014, Image Vis. Comput..

[7]  Michael Wagner,et al.  Eye movement analysis for depression detection , 2013, 2013 IEEE International Conference on Image Processing.

[8]  Roland Göcke,et al.  Diagnosis of depression by behavioural signals: a multimodal approach , 2013, AVEC@ACM Multimedia.

[9]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[10]  Albert Ali Salah,et al.  Eyes Whisper Depression: A CCA based Multimodal Approach , 2014, ACM Multimedia.

[11]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[12]  Lorenzo Torresani,et al.  C3D: Generic Features for Video Analysis , 2014, ArXiv.

[13]  J. Rabe-Jabłońska,et al.  [Affective disorders in the fourth edition of the classification of mental disorders prepared by the American Psychiatric Association -- diagnostic and statistical manual of mental disorders]. , 1993, Psychiatria polska.

[14]  Shrikanth S. Narayanan,et al.  An Affect Prediction Approach Through Depression Severity Parameter Incorporation in Neural Networks , 2017, INTERSPEECH.

[15]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[16]  Hugo Jair Escalante,et al.  Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge , 2014, AVEC '14.

[17]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[18]  Colin R. Martin,et al.  A narrative review of the Beck Depression Inventory (BDI) and implications for its use in an alcohol-dependent population. , 2010, Journal of psychiatric and mental health nursing.

[19]  Roland Göcke,et al.  Relative Body Parts Movement for Automatic Depression Analysis , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[22]  Albert Ali Salah,et al.  Ensemble CCA for Continuous Emotion Prediction , 2014, AVEC '14.

[23]  Manolis Tsiknakis,et al.  Automatic Assessment of Depression Based on Visual Cues: A Systematic Review , 2019, IEEE Transactions on Affective Computing.

[24]  Fan Zhang,et al.  Artificial Intelligent System for Automatic Depression Level Analysis Through Visual and Vocal Expressions , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[25]  I. Jones,et al.  Some Nonverbal Aspects of Depression and Schizophrenia Occurring during the Interview , 1979, The Journal of nervous and mental disease.

[26]  Niall Firth,et al.  Computers diagnose depression from our body language , 2013 .

[27]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ya Li,et al.  Multi task sequence learning for depression scale prediction from video , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[29]  Christopher Joseph Pal,et al.  Recurrent Neural Networks for Emotion Recognition in Video , 2015, ICMI.

[30]  Roland Göcke,et al.  Can body expressions contribute to automatic depression analysis? , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[31]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[32]  Sascha Meudt,et al.  Fusion of Audio-visual Features using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression , 2014, ICPRAM.

[33]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[34]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[35]  Yunhong Wang,et al.  Cost-Sensitive Two-Stage Depression Prediction Using Dynamic Visual Clues , 2016, ACCV.

[36]  Xin Li,et al.  Automated Depression Diagnosis Based on Facial Dynamic Analysis and Sparse Coding , 2015, IEEE Transactions on Information Forensics and Security.

[37]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[38]  Guodong Guo,et al.  Automated Depression Diagnosis Based on Deep Networks to Encode Facial Appearance and Dynamics , 2018, IEEE Transactions on Affective Computing.

[39]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[40]  Jeffrey F. Cohn,et al.  Detecting Depression Severity from Vocal Prosody , 2013, IEEE Transactions on Affective Computing.

[41]  Fan Zhang,et al.  Automatic Depression Scale Prediction using Facial Expression Dynamics and Regression , 2014, AVEC '14.