Two-stage Temporal Modelling Framework for Video-based Depression Recognition using Graph Representation

Video-based automatic depression analysis provides a fast, objective and repeatable self-assessment solution, which has been widely developed in recent years. While depression clues may be reflected by human facial behaviours of various temporal scales, most existing approaches either focused on modelling depression from short-term or video-level facial behaviours. In this sense, we propose a two-stage framework that models depression severity from multi-scale short-term and video-level facial behaviours. The short-term depressive behaviour modelling stage first deep learns depression-related facial behavioural features from multiple short temporal scales, where a Depression Feature Enhancement (DFE) module is proposed to enhance the depression-related clues for all temporal scales and remove non-depression noises. Then, the video-level depressive behaviour modelling stage proposes two novel graph encoding strategies, i.e., Sequential Graph Representation (SEG) and Spectral Graph Representation (SPG), to re-encode all short-term features of the target video into a video-level graph representation, summarizing depression-related multi-scale video-level temporal information. As a result, the produced graph representations predict depression severity using both short-term and long-term facial beahviour patterns. The experimental results on AVEC 2013 and AVEC 2014 datasets show that the proposed DFE module constantly enhanced the depression severity estimation performance for various CNN models while the SPG is superior than other video-level modelling methods. More importantly, the result achieved for the proposed two-stage framework shows its promising and solid performance compared to widely-used one-stage modelling approaches.

[1]  Shan Li,et al.  Deep Facial Expression Recognition: A Survey , 2018, IEEE Transactions on Affective Computing.

[2]  Wolfgang Minker,et al.  Emotion Recognition and Depression Diagnosis by Acoustic and Visual Features: A Multimodal Approach , 2014, AVEC '14.

[3]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[5]  Shashank Jaiswal,et al.  Automatic prediction of Depression and Anxiety from behaviour and personality attributes , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).

[6]  Dongmei Jiang,et al.  Efficient Spatial Temporal Convolutional Features for Audiovisual Continuous Affect Recognition , 2019, AVEC@MM.

[7]  Fan Zhang,et al.  Artificial Intelligent System for Automatic Depression Level Analysis Through Visual and Vocal Expressions , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[8]  Abdenour Hadid,et al.  A Deep Multiscale Spatiotemporal Network for Assessing Depression From Facial Dynamics , 2022, IEEE Transactions on Affective Computing.

[9]  I. Gotlib,et al.  Further evidence for the cultural norm hypothesis: positive emotion in depressed and control European American and Asian American women. , 2010, Cultural diversity & ethnic minority psychology.

[10]  Lijun Yin,et al.  Region of Interest Based Graph Convolution: A Heatmap Regression Approach for Action Unit Detection , 2020, ACM Multimedia.

[11]  S. Shan,et al.  Facial Expression Recognition for In-the-wild Videos , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[12]  Chris Greenhalgh,et al.  Virtual Human Questionnaire for Analysis of Depression, Anxiety and Personality , 2019, IVA.

[13]  Suhaila Mohammed,et al.  A novel facial emotion recognition scheme based on graph mining , 2020 .

[14]  Guodong Guo,et al.  Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features , 2018, IEEE Transactions on Affective Computing.

[15]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[16]  Xiuzhuang Zhou,et al.  Facial Depression Recognition by Deep Joint Label Distribution and Metric Learning , 2022, IEEE Transactions on Affective Computing.

[17]  LinLin Shen,et al.  Human Behaviour-Based Automatic Depression Analysis Using Hand-Crafted Statistics and Deep Learned Spectral Features , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[18]  Bolei Zhou,et al.  Temporal Pyramid Network for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Guodong Guo,et al.  Visually Interpretable Representation Learning for Depression Recognition from Facial Images , 2020, IEEE Transactions on Affective Computing.

[20]  Jeffrey F. Cohn,et al.  Dynamic Multimodal Measurement of Depression Severity Using Deep Autoencoding , 2018, IEEE Journal of Biomedical and Health Informatics.

[21]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[22]  LinLin Shen,et al.  Self-supervised learning of Dynamic Representations for Static Images , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).

[23]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[24]  B. Renneberg,et al.  Facial expression of emotions in borderline personality disorder and depression. , 2005, Journal of behavior therapy and experimental psychiatry.

[25]  Tanaya Guha,et al.  Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions , 2014, AVEC '14.

[26]  Christian Poellabauer,et al.  Topic Modeling Based Multi-modal Depression Detection , 2017, AVEC@ACM Multimedia.

[27]  Hichem Sahli,et al.  Integrating Deep and Shallow Models for Multi-Modal Depression Analysis—Hybrid Architectures , 2018, IEEE Transactions on Affective Computing.

[28]  Guodong Guo,et al.  Automated Depression Diagnosis Based on Deep Networks to Encode Facial Appearance and Dynamics , 2018, IEEE Transactions on Affective Computing.

[29]  Roland Göcke,et al.  Can body expressions contribute to automatic depression analysis? , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[30]  Hichem Sahli,et al.  Automatic Depression Analysis Using Dynamic Facial Appearance Descriptor and Dirichlet Process Fisher Encoding , 2019, IEEE Transactions on Multimedia.

[31]  Huadong Ma,et al.  Context-Aware Affective Graph Reasoning for Emotion Recognition , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[32]  Mohammad Soleymani,et al.  AVEC 2019 Workshop and Challenge: State-of-Mind, Detecting Depression with AI, and Cross-Cultural Affect Recognition , 2019, AVEC@MM.

[33]  Victor O.K. Li,et al.  Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution , 2020, AAAI.

[34]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[35]  Hefeng Wu,et al.  Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition , 2020, ACM Multimedia.

[36]  Azher Uddin,et al.  Depression Level Prediction Using Deep Spatiotemporal Features and Multilayer Bi-LTSM , 2022, IEEE Transactions on Affective Computing.

[37]  G. Arbanas Diagnostic and Statistical Manual of Mental Disorders (DSM-5) , 2015 .

[38]  Xin Li,et al.  Automated Depression Diagnosis Based on Facial Dynamic Analysis and Sparse Coding , 2015, IEEE Transactions on Information Forensics and Security.

[39]  W. Gaebel,et al.  Facial expressivity in the course of schizophrenia and depression , 2004, European Archives of Psychiatry and Clinical Neuroscience.

[40]  Zhongmin Wang,et al.  Automatic depression recognition using CNN with attention mechanism from videos , 2021, Neurocomputing.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Dongliang Li,et al.  A Random Forest Regression Method With Selected-Text Feature For Depression Assessment , 2017, AVEC@ACM Multimedia.

[43]  Varun Jain,et al.  Depression Estimation Using Audiovisual Features and Fisher Vector Encoding , 2014, AVEC '14.

[44]  Miguel Bordallo López,et al.  MDN: A Deep Maximization-Differentiation Network for Spatio-Temporal Depression Detection , 2023, IEEE Transactions on Affective Computing.

[45]  Mohammad H. Mahoor,et al.  Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses , 2014, Image Vis. Comput..

[46]  Haniye Sadat Sajadi,et al.  Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017 , 2018, The Lancet.

[47]  Pan Zhou,et al.  Video-based Facial Expression Recognition using Graph Convolutional Networks , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[48]  Shigang Li,et al.  A Novel Graph-TCN with a Graph Structured Representation for Micro-expression Recognition , 2020, ACM Multimedia.

[49]  Jianwu Dang,et al.  Relation Modeling with Graph Convolutional Networks for Facial Action Unit Detection , 2020, MMM.

[50]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Xingming Zhang,et al.  Facial Expression Recognition Using Spatial-Temporal Semantic Graph Network , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[53]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[54]  H P Hirsbrunner,et al.  Analyzing nonverbal behavior in depression. , 1983, Journal of abnormal psychology.

[55]  Bin Liu,et al.  Multimodal Spatiotemporal Representation for Automatic Depression Level Detection , 2023, IEEE Transactions on Affective Computing.

[56]  Hatice Gunes,et al.  CLIFER: Continual Learning with Imagination for Facial Expression Recognition , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[57]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[58]  Fabien Ringeval,et al.  AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge , 2017, AVEC@ACM Multimedia.

[59]  Li Fei-Fei,et al.  Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions , 2018, ArXiv.

[60]  D. Shapiro,et al.  Reduced facial expression and social context in major depression: discrepancies between facial muscle activity and self-reported emotion , 2000, Psychiatry Research.

[61]  Shashank Jaiswal,et al.  Spectral Representation of Behaviour Primitives for Depression Analysis , 2020 .