Multimodal Depression Detection based on Factorized Representation

Untreated depression increases the chance of risky behavior, including suicide. However, there is lack of treatment since traditional depression diagnosis can be time-consuming and expensive. Recently, a growing body of evidence suggests that facial motions and language usage are significantly different between depression patients and healthy persons. In this paper, we devise a novel auto-encoder framework with multimodal factorization technique for depression detection based on facial images and the transcribed texts, aiming to eliminate redundancies and focus on key factors in the visual and textual modality. It consists of three stages, i.e., feature extraction and memory-based modality fusion, multimodal factorization, and reconstruction and prediction. Firstly, high-level features are extracted from facial images and transcribed texts by ResNet 50 and BERT, respectively. Meanwhile, they are fused by memory fusion network to obtain cross-modal features. Then, multimodal factorization takes the above three kinds of features to predict the depression severity and jointly reconstructs the single-modal input. We conduct experiments and ablation studies on a self-collected Chinese depression detection dataset to prove the effectiveness and robustness of our method.

[1]  Miguel Bordallo López,et al.  MDN: A Deep Maximization-Differentiation Network for Spatio-Temporal Depression Detection , 2023, IEEE Transactions on Affective Computing.

[2]  Ming-Yu Liu,et al.  Multimodal Deep Learning Framework for Mental Disorder Recognition , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[3]  Dan J Stein,et al.  The clinical characterization of the adult patient with depression aimed at personalization of management , 2020, World psychiatry : official journal of the World Psychiatric Association.

[4]  Guodong Guo,et al.  Visually Interpretable Representation Learning for Depression Recognition from Facial Images , 2020, IEEE Transactions on Affective Computing.

[5]  Shi Yin,et al.  A Multi-Modal Hierarchical Recurrent Neural Network for Depression Detection , 2019, AVEC@MM.

[6]  Björn Schuller,et al.  A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews , 2019, INTERSPEECH.

[7]  Li Fei-Fei,et al.  Measuring Depression Symptom Severity from Spoken Language and 3D Facial Expressions , 2018, ArXiv.

[8]  Wei Li,et al.  Learning Universal Sentence Representations with Mean-Max Attention Autoencoder , 2018, EMNLP.

[9]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[10]  Jeffrey F. Cohn,et al.  Dynamic Multimodal Measurement of Depression Severity Using Deep Autoencoding , 2018, IEEE Journal of Biomedical and Health Informatics.

[11]  Erik Cambria,et al.  Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.

[12]  Sharath Chandra Guntuku,et al.  Detecting depression and mental illness on social media: an integrative review , 2017, Current Opinion in Behavioral Sciences.

[13]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[14]  Erik Cambria,et al.  Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[15]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[16]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[17]  Ping Hu,et al.  HoloNet: towards robust emotion recognition in the wild , 2016, ICMI.

[18]  Thomas F. Quatieri,et al.  Detecting Depression using Vocal, Facial and Semantic Communication Cues , 2016, AVEC@ACM Multimedia.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[21]  David DeVault,et al.  The Distress Analysis Interview Corpus of human and computer interviews , 2014, LREC.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[24]  K. Douglas,et al.  Processing of Facial Emotion Expression in Major Depression: A Review , 2010, The Australian and New Zealand journal of psychiatry.

[25]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[26]  C. Mathers,et al.  Projections of Global Mortality and Burden of Disease from 2002 to 2030 , 2006, PLoS medicine.

[27]  R. Spitzer,et al.  The PHQ-9 , 2001, Journal of General Internal Medicine.

[28]  G. Wilkinson,et al.  Gender differences in depression. Critical review. , 2000, The British journal of psychiatry : the journal of mental science.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  Katie L Burkhouse,et al.  Vulnerability to Depression in Youth: Advances from Affective Neuroscience. , 2017, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[31]  R. Spitzer,et al.  The PHQ-9: validity of a brief depression severity measure. , 2001, Journal of general internal medicine.