Multimodal Depression Detection: An Investigation of Features and Fusion Techniques for Automated Systems

Multimodal Depression Detection: An Investigation of Features and Fusion Techniques for Automated Systems by Michelle Renee Morales Advisor: Rivka Levitan Depression is a serious illness that a↵ects a large portion of the world’s population. Given the large e↵ect it has on society, it is evident that depression is a serious health issue. This thesis evaluates, at length, how technology may aid in assessing depression. We present an in-depth investigation of features and fusion techniques for depression detection systems. We also present OpenMM: a novel tool for multimodal feature extraction. Lastly, we present novel techniques for multimodal fusion. The contributions of this work add considerably to our knowledge of depression detection systems and have the potential to improve future systems by incorporating that knowledge into their design.

[1]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[2]  Andrew Rosenberg,et al.  AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.

[3]  Timothy Skinner,et al.  Analysis of syntax and word use to predict successful participation in guided self-help for anxiety and depression , 2010, Psychiatry Research.

[4]  Thomas F. Quatieri,et al.  On the relative importance of vocal source, system, and prosody in human depression , 2013, 2013 IEEE International Conference on Body Sensor Networks.

[5]  Paul E. Croarkin,et al.  Evidence for GABAergic inhibitory deficits in major depressive disorder , 2011, Neuroscience & Biobehavioral Reviews.

[6]  Maarten Sap,et al.  Towards Assessing Changes in Degree of Depression through Facebook , 2014, CLPsych@ACL.

[7]  M. Alpert,et al.  Reflections of depression in acoustic measures of the patient's speech. , 2001, Journal of affective disorders.

[8]  高橋 栄 Diagnostic and Statistical Manual of Mental Disorders(DSM)-5による分類と診断 (特集 周産期メンタルヘルス : 妊婦の不安とどう立ち向かうか) , 2014 .

[9]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[10]  Karina W Davidson,et al.  Psychological Theories of Depression: Potential Application for the Prevention of Acute Coronary Syndrome Recurrence , 2004, Psychosomatic medicine.

[11]  Margaret Lech,et al.  Video-based detection of the clinical depression in adolescents , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[12]  Michael Wagner,et al.  Multimodal assistive technologies for depression diagnosis and monitoring , 2013, Journal on Multimodal User Interfaces.

[13]  Barbara J. Sahakian,et al.  Elevated morning cortisol is a stratified population-level biomarker for major depression in boys only with high depressive symptoms , 2014, Proceedings of the National Academy of Sciences.

[14]  Thomas F. Quatieri,et al.  Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity , 2012, INTERSPEECH.

[15]  Hugo Jair Escalante,et al.  Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge , 2014, AVEC '14.

[16]  Vidhyasaharan Sethu,et al.  Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Mohamed Abouelenien,et al.  Deception Detection using Real-life Trial Data , 2015, ICMI.

[18]  Dietrich Klakow,et al.  Testing the correlation of word error rate and perplexity , 2002, Speech Commun..

[19]  Suramya Tomar,et al.  Converting video formats with FFmpeg , 2006 .

[20]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[22]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[23]  Trevor Sharp,et al.  5-HT and depression: is the glass half-full? , 2011, Current opinion in pharmacology.

[24]  Thomas F. Quatieri,et al.  Phonologically-based biomarkers for major depressive disorder , 2011, EURASIP J. Adv. Signal Process..

[25]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[26]  W. Drevets,et al.  Discovering Endophenotypes for Major Depression , 2004, Neuropsychopharmacology.

[27]  Jeffrey F. Cohn,et al.  Detecting Depression Severity from Vocal Prosody , 2013, IEEE Transactions on Affective Computing.

[28]  P. Lewinsohn,et al.  A behavioral approach to depression. , 1974 .

[29]  Elliot Moore,et al.  Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech , 2008, IEEE Transactions on Biomedical Engineering.

[30]  Ville Lehtinen,et al.  Usefulness of the Beck Depression Inventory as a screening method for depression among the general population of Finland , 2009, Scandinavian journal of public health.

[31]  Antonio Nucci,et al.  Fuzzy-Clustering-Based Decision Tree Approach for Large Population Speaker Identification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Albert A. Rizzo,et al.  Automatic behavior descriptors for psychological disorder analysis , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[33]  Rafael A. Calvo,et al.  Natural language processing in mental health applications using non-clinical texts† , 2017, Natural Language Engineering.

[34]  W. Grove,et al.  Clinical versus mechanical prediction: a meta-analysis. , 2000, Psychological assessment.

[35]  Gwen Littlewort,et al.  The computer expression recognition toolbox (CERT) , 2011, Face and Gesture 2011.

[36]  J. Pennebaker,et al.  Computergestützte quantitative textanalyse: Äquivalenz und robustheit der deutschen Version des Linguistic Inquiry and Word Count. , 2008 .

[37]  J. Pennebaker Writing About Emotional Experiences as a Therapeutic Process , 1997 .

[38]  Fabien Ringeval,et al.  Summary for AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, ACM Multimedia.

[39]  Thomas F. Quatieri,et al.  Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing , 2014, AVEC '14.

[40]  Michael Wagner,et al.  Prosody as a Diagonalization of Syntax . Evidence from Complex Predicates † , 2004 .

[41]  Stefan Scherer,et al.  OpenMM: An Open-Source Multimodal Feature Extraction Tool , 2017, INTERSPEECH.

[42]  Klaus Linde,et al.  Physicians' attitudes, diagnostic process and barriers regarding depression diagnosis in primary care: a systematic review of qualitative studies. , 2012, Family practice.

[43]  J. Greenberg,et al.  Self-regulatory perseveration and the depressive self-focusing style: a self-awareness theory of reactive depression. , 1987, Psychological bulletin.

[44]  Graeme Hirst,et al.  Detecting late-life depression in Alzheimer’s disease through analysis of speech and language , 2016, CLPsych@HLT-NAACL.

[45]  Samuel R. Friedman,et al.  Depression: Clinical, Experimental, and Theoretical Aspects , 1968 .

[46]  Thomas F. Quatieri,et al.  Detecting Depression using Vocal, Facial and Semantic Communication Cues , 2016, AVEC@ACM Multimedia.

[47]  I. Hickie,et al.  Classifying Depression by Mental State Signs , 1990, British Journal of Psychiatry.

[48]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[49]  I. Gotlib,et al.  Handbook of Depression , 2011 .

[50]  J. John Mann,et al.  Prefrontal and Cerebellar Abnormalities in Major Depression: Evidence from Oculomotor Studies , 1998, Biological Psychiatry.

[51]  Vidhyasaharan Sethu,et al.  Analysis of acoustic space variability in speech affected by depression , 2015, Speech Commun..

[52]  Leonardo Max Batista Claudino,et al.  Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter , 2015, CLPsych@HLT-NAACL.

[53]  Chris Brew,et al.  Classifying ReachOut posts with a radial basis function SVM , 2016, CLPsych@HLT-NAACL.

[54]  Albert A. Rizzo,et al.  Automatic audiovisual behavior descriptors for psychological disorder analysis , 2014, Image Vis. Comput..

[55]  Fabien Ringeval,et al.  AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge , 2017, AVEC@ACM Multimedia.

[56]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[57]  Tamás D. Gedeon,et al.  A comparative study of different classifiers for detecting depression from spontaneous speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[58]  P. Schnurr,et al.  Diagnostic classification through content analysis of patients' speech. , 1988, The American journal of psychiatry.

[59]  T. Chai,et al.  Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature , 2014 .

[60]  J. Becker,et al.  The natural history of Alzheimer's disease. Description of study cohort and accuracy of diagnosis. , 1994, Archives of neurology.

[61]  Gary Christopher,et al.  The impact of clinical depression on working memory , 2005, Cognitive neuropsychiatry.

[62]  M. Åsberg,et al.  A New Depression Scale Designed to be Sensitive to Change , 1979, British Journal of Psychiatry.

[63]  Elmar Nöth,et al.  Automatic modelling of depressed speech: relevant features and relevance of gender , 2014, INTERSPEECH.

[64]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[65]  Eric Horvitz,et al.  Characterizing and predicting postpartum depression from shared facebook data , 2014, CSCW.

[66]  Elizabeth D. Cox,et al.  Feeling bad on Facebook: depression disclosures by college students on a social networking site , 2011, Depression and anxiety.

[67]  Sunghwan Mac Kim,et al.  Data61-CSIRO systems at the CLPsych 2016 Shared Task , 2016, CLPsych@HLT-NAACL.

[68]  J. Pennebaker,et al.  Language use of depressed and depression-vulnerable college students , 2004 .

[69]  Christopher D. Manning,et al.  Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines , 2008 .

[70]  A. Bouhuys,et al.  Non-verbal interaction involvement as an indicator of prognosis in remitted depressed subjects , 2002, Psychiatry Research.

[71]  Shervin Malmasi,et al.  Predicting Post Severity in Mental Health Forums , 2016, CLPsych@HLT-NAACL.

[72]  Christopher M. Danforth,et al.  Instagram photos reveal predictive markers of depression , 2016, EPJ Data Science.

[73]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[74]  Michael Wagner,et al.  Cross-cultural detection of depression from nonverbal behaviour , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[75]  D Hell,et al.  The speech analysis approach to determining onset of improvement under antidepressants , 1998, European Neuropsychopharmacology.

[76]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  J. Schelde Major depression: behavioral markers of depression and recovery. , 1998, The Journal of nervous and mental disease.

[78]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[79]  Björn W. Schuller,et al.  OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[80]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[81]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[82]  Ronald S Duman,et al.  Functional Biomarkers of Depression: Diagnosis, Treatment, and Pathophysiology , 2011, Neuropsychopharmacology.

[83]  Tomi Kinnunen COMPARISON OF CLUSTERING ALGORITHMS IN SPEAKER IDENTIFICATION , 2000 .

[84]  Louis-Philippe Morency,et al.  Adolescent Suicidal Risk Assessment in Clinician-Patient Interaction , 2017, IEEE Transactions on Affective Computing.

[85]  R. Kessler,et al.  Sex and depression in the National Comorbidity Survey. II: Cohort effects. , 1994, Journal of affective disorders.

[86]  K. Scherer Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.

[87]  Roland Göcke,et al.  Can body expressions contribute to automatic depression analysis? , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[88]  Svetha Venkatesh,et al.  Affective and Content Analysis of Online Depression Communities , 2014, IEEE Transactions on Affective Computing.

[89]  G. Blanken Linguistic Disorders and Pathologies An International Handbook , 1993 .

[90]  Mahadev Satyanarayanan,et al.  OpenFace: A general-purpose face recognition library with mobile applications , 2016 .

[91]  Wolfgang Minker,et al.  Emotion Recognition and Depression Diagnosis by Acoustic and Visual Features: A Multimodal Approach , 2014, AVEC '14.

[92]  Jure Leskovec,et al.  Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health , 2016, TACL.

[93]  Rudi C. Villing,et al.  Automatic Blind Syllable Segmentation for Continuous Speech , 2004 .

[94]  Dongmei Jiang,et al.  Decision Tree Based Depression Classification from Audio Video and Language Information , 2016, AVEC@ACM Multimedia.

[95]  W. Gaebel,et al.  Facial expressivity in the course of schizophrenia and depression , 2004, European Archives of Psychiatry and Clinical Neuroscience.

[96]  Rivka Levitan,et al.  Mitigating Confounding Factors in Depression Detection Using an Unsupervised Clustering Approach , 2016 .

[97]  Klaus R. Scherer,et al.  Vocal indicators of mood change in depression , 1996 .

[98]  Douglas E. Sturim,et al.  Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis , 2011, INTERSPEECH.

[99]  Rafael A. Calvo,et al.  CLPsych 2016 Shared Task: Triaging content in online peer-support forums , 2016, CLPsych@HLT-NAACL.

[100]  M. Swerts,et al.  Verbal and Nonverbal Correlates for Depression: A Review , 2012 .

[101]  Louis-Philippe Morency,et al.  Investigating voice quality as a speaker-independent indicator of depression and PTSD , 2013, INTERSPEECH.

[102]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[103]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[104]  Stefan Scherer,et al.  A Cross-modal Review of Indicators for Depression Detection Systems , 2017, CLPsych@ACL.

[105]  P. Bech,et al.  The heterogeneity of the depressive syndrome: when numbers get serious , 2011, Acta psychiatrica Scandinavica.

[106]  B. Starfield,et al.  Defining Comorbidity: Implications for Understanding Health and Health Services , 2009, The Annals of Family Medicine.

[107]  Richard Anderson-Sprecher,et al.  Model Comparisons and R 2 , 1994 .

[108]  J. Pennebaker,et al.  Word Use in the Poetry of Suicidal and Nonsuicidal Poets , 2001, Psychosomatic medicine.

[109]  Victoria Stodden,et al.  Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research , 2014 .

[110]  H Hollien,et al.  [Vocal and speech patterns of depressive patients]. , 1977, Folia phoniatrica.

[111]  Mark Dredze,et al.  Shared Task : Depression and PTSD on Twitter , 2015 .

[112]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[113]  Michael Cannizzaro,et al.  Voice acoustical measurement of the severity of major depression , 2004, Brain and Cognition.

[114]  William E. Cooper,et al.  Syntax and Speech , 1980 .

[115]  Julia Hirschberg,et al.  Hybrid Acoustic-Lexical Deep Learning Approach for Deception Detection , 2017, INTERSPEECH.

[116]  Kenneth E Freedland,et al.  Depression, the Autonomic Nervous System, and Coronary Heart Disease , 2005, Psychosomatic medicine.

[117]  Daniel A. Schneider,et al.  Empathic behavioral and physiological responses to dynamic stimuli in depression , 2012, Psychiatry Research.

[118]  Li Sun,et al.  A Depression Detection Model Based on Sentiment Analysis in Micro-blog Social Network , 2013, PAKDD Workshops.

[119]  Mohammad H. Mahoor,et al.  Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses , 2014, Image Vis. Comput..

[120]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[121]  Suresh Manandhar,et al.  SemEval-2014 Task 7: Analysis of Clinical Text , 2014, *SEMEVAL.

[122]  David DeVault,et al.  The Distress Analysis Interview Corpus of human and computer interviews , 2014, LREC.

[123]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[124]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[125]  N. Freedman,et al.  The language of depression. , 1981, Bulletin of the Menninger Clinic.

[126]  Ge Chen Visualizations for mental health topic models , 2014 .

[127]  Philip S. Holzman,et al.  Horizontal and vertical pursuit eye movements, the oculocephalic reflex, and the functional psychoses , 1980, Psychiatry Research.

[128]  Serguei V. S. Pakhomov,et al.  Computerized assessment of syntactic complexity in Alzheimer’s disease: a case study of Iris Murdoch’s writing , 2011, Behavior research methods.

[129]  Roland Göcke,et al.  Diagnosis of depression by behavioural signals: a multimodal approach , 2013, AVEC@ACM Multimedia.

[130]  C. Segrin Social skills deficits associated with depression. , 2000, Clinical psychology review.

[131]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[132]  A. Steiger,et al.  Wake and sleep EEG provide biomarkers in depression. , 2010, Journal of psychiatric research.

[133]  Sidney K. D'Mello,et al.  Consistent but modest: a meta-analysis on unimodal and multimodal affect detection accuracies from 30 studies , 2012, ICMI '12.

[134]  Louis-Philippe Morency,et al.  Audiovisual behavior descriptors for depression assessment , 2013, ICMI '13.

[135]  R. Spitzer,et al.  The PHQ-9 , 2001, Journal of General Internal Medicine.

[136]  Judith A. Hall,et al.  Nonverbal behavior in clinician—patient interaction , 1995 .

[137]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[138]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[139]  L. A. Abel,et al.  Quantitative assessment of smooth pursuit gain and catch-up saccades in schizophrenia and affective disorders , 1991, Biological Psychiatry.

[140]  J. Mundt,et al.  Vocal Acoustic Biomarkers of Depression Severity and Treatment Response , 2012, Biological Psychiatry.

[141]  H. Sackeim,et al.  Psychomotor symptoms of depression. , 1997, The American journal of psychiatry.

[142]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[143]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .