Multimodal temporal machine learning for Bipolar Disorder and Depression Recognition

Mental disorder is a serious public health concern that affects the life of millions of people throughout the world. Early diagnosis is essential to ensure timely treatment and to improve the well-being of those affected by a mental disorder. In this paper, we present a novel multimodal framework to perform mental disorder recognition from videos. The proposed approach employs a combination of audio, video and textual modalities. Using recurrent neural network architectures, we incorporate the temporal information in the learning process and model the dynamic evolution of the features extracted for each patient. For multimodal fusion, we propose an efficient late fusion strategy based on a simple feed-forward neural network that we call adaptive nonlinear judge classifier. We evaluate the proposed framework on two mental disorder datasets. On both, the experimental results demonstrate that the proposed framework outperforms the state-of-the-art approaches. We also study the importance of each modality for mental disorder recognition and infer interesting conclusions about the temporal nature of each modality. Our findings demonstrate that careful consideration of the temporal evolution of each modality is of crucial importance to accurately perform mental disorder recognition.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[3]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[4]  G. Freedman,et al.  Burden of Depressive Disorders by Country, Sex, Age, and Year: Findings from the Global Burden of Disease Study 2010 , 2013, PLoS medicine.

[5]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[6]  N. Ambady,et al.  Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. , 1992 .

[7]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  A. Kazdin,et al.  Rebooting Psychotherapy Research and Practice to Reduce the Burden of Mental Illness , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[10]  Eric Turkheimer,et al.  Interpersonal perception and personality disorders: Utilization of a thin slice approach , 2007 .

[11]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[12]  R. Spitzer,et al.  The PHQ-9 , 2001, Journal of General Internal Medicine.

[13]  N. Ambady,et al.  Accuracy of judgments of sexual orientation from thin slices of behavior. , 1999, Journal of personality and social psychology.

[14]  B. Löwe,et al.  A brief measure for assessing generalized anxiety disorder: the GAD-7. , 2006, Archives of internal medicine.

[15]  R. Khanna,et al.  Support Vector Regression , 2015 .

[16]  R. Kessler,et al.  Measuring stress: A guide for health and social scientists. , 1995 .

[17]  Zhiwei He,et al.  Multi-modality Hierarchical Recall based on GBDTs for Bipolar Disorder Classification , 2018, AVEC@MM.

[18]  Fabien Ringeval,et al.  AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition , 2018, AVEC@MM.

[19]  Yunhong Wang,et al.  DepAudioNet: An Efficient Deep Model for Audio based Depression Classification , 2016, AVEC@ACM Multimedia.

[20]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[21]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[22]  B. Löwe,et al.  The somatic symptom scale-8 (SSS-8): a brief measure of somatic symptom burden. , 2014, JAMA internal medicine.

[23]  A. David Marshall,et al.  Automated Screening for Bipolar Disorder from Audio/Visual Modalities , 2018, AVEC@MM.

[24]  Jeffrey F. Cohn,et al.  Dynamic Multimodal Measurement of Depression Severity Using Deep Autoencoding , 2018, IEEE Journal of Biomedical and Health Informatics.

[25]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yunhong Wang,et al.  Bipolar Disorder Recognition via Multi-scale Discriminative Audio Temporal Representation , 2018, AVEC@MM.

[27]  Ming-Yu Liu,et al.  Multimodal Deep Learning Framework for Mental Disorder Recognition , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[28]  Jian Huang,et al.  Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network , 2017, AVEC@ACM Multimedia.

[29]  T. Strine,et al.  The PHQ-8 as a measure of current depression in the general population. , 2009, Journal of affective disorders.

[30]  Sheldon Cohen,et al.  PERCEIVED STRESS SCALE , 2014 .

[31]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[32]  R. C. Young,et al.  A Rating Scale for Mania: Reliability, Validity and Sensitivity , 1978, British Journal of Psychiatry.

[33]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[34]  Michael Wagner,et al.  Cross-cultural detection of depression from nonverbal behaviour , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[35]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[36]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[38]  Yale Song,et al.  Learning a sparse codebook of facial and body microexpressions for emotion recognition , 2013, ICMI '13.

[39]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[40]  G. Sullivan,et al.  Schizophrenia and comorbid human immunodeficiency virus or hepatitis C virus. , 2005, The Journal of clinical psychiatry.

[41]  Ming-Yu Liu,et al.  Automatic Detection of Self-Adaptors for Psychological Distress , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[42]  Albert Ali Salah,et al.  The Turkish Audio-Visual Bipolar Disorder Corpus , 2018, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia).

[43]  L. Dixon,et al.  The association of medical comorbidity in schizophrenia with poor physical and mental health. , 1999, The Journal of nervous and mental disease.

[44]  M. Amore,et al.  Duration of untreated illness and outcomes in unipolar depression: a systematic review and meta-analysis. , 2014, Journal of affective disorders.

[45]  Hamdi Dibeklioglu,et al.  Multimodal Detection of Depression in Clinical Interviews , 2015, ICMI.

[46]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[47]  A. Altamura,et al.  Duration of untreated illness and suicide in bipolar disorder: a naturalistic study , 2010, European Archives of Psychiatry and Clinical Neuroscience.

[48]  Fabien Ringeval,et al.  AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge , 2017, AVEC@ACM Multimedia.

[49]  S. O'Donnell,et al.  Factors associated with delayed diagnosis of mood and/or anxiety disorders. , 2017, Health promotion and chronic disease prevention in Canada : research, policy and practice.

[50]  Indigo J. D. Orton Vision based body gesture meta features for Affective Computing , 2020, ArXiv.

[51]  Albert Ali Salah,et al.  Eyes Whisper Depression: A CCA based Multimodal Approach , 2014, ACM Multimedia.

[52]  B. Kable Mental health. , 2005, Australian family physician.

[53]  H. Gray,et al.  On being sad and mistaken: mood effects on the accuracy of thin-slice judgments. , 2002, Journal of personality and social psychology.

[54]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[55]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[56]  Dongmei Jiang,et al.  Bipolar Disorder Recognition with Histogram Features of Arousal and Body Gestures , 2018, AVEC@MM.

[57]  R. Kessler,et al.  Failure and delay in initial treatment contact after first onset of mental disorders in the National Comorbidity Survey Replication. , 2005, Archives of general psychiatry.