Automated assessment of psychiatric disorders using speech: A systematic review

There are many barriers to accessing mental health assessments including cost and stigma. Even when individuals receive professional care, assessments are intermittent and may be limited partly due to the episodic nature of psychiatric symptoms. Therefore, machine‐learning technology using speech samples obtained in the clinic or remotely could one day be a biomarker to improve diagnosis and treatment. To date, reviews have only focused on using acoustic features from speech to detect depression and schizophrenia. Here, we present the first systematic review of studies using speech for automated assessments across a broader range of psychiatric disorders.

[1]  M. Hamilton,et al.  Development of a rating scale for primary depressive illness. , 1967, The British journal of social and clinical psychology.

[2]  Emil Kraepelin,et al.  Manic-depressive insanity and paranoia , 1976 .

[3]  R. C. Young,et al.  A Rating Scale for Mania: Reliability, Validity and Sensitivity , 1978, British Journal of Psychiatry.

[4]  S. Rothstein,et al.  Bulimia: The Otolaryngology Head and Neck Perspective , 1992, Ear, nose, & throat journal.

[5]  S. Rothstein Reflux and vocal disorders in singers with bulimia. , 1998, Journal of voice : official journal of the Voice Foundation.

[6]  W. Ziegler Task-Related Factors in Oral Motor Control: Speech and Oral Diadochokinesis in Dysarthria and Apraxia of Speech , 2002, Brain and Language.

[7]  Y. Ho,et al.  Simple Explanation of the No-Free-Lunch Theorem and Its Implications , 2002 .

[8]  M. Menken,et al.  The wall between neurology and psychiatry , 2002, BMJ : British Medical Journal.

[9]  L. Snowden,et al.  Bias in mental health assessment and intervention: theory and evidence. , 2003, American journal of public health.

[10]  R. Turner,et al.  Physical disability and mental health: An epidemiology of psychiatric and substance disorders. , 2006 .

[11]  J. Mundt,et al.  Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology , 2007, Journal of Neurolinguistics.

[12]  P. Balata,et al.  Bulimia nervosa as a risk factor for voice disorders - literature review , 2008, Brazilian journal of otorhinolaryngology.

[13]  Lauren M. Bylsma,et al.  A meta-analysis of emotional reactivity in major depressive disorder. , 2008, Clinical psychology review.

[14]  Ann Packman,et al.  Prevalence of anxiety disorders among adults seeking speech therapy for stuttering. , 2009, Journal of anxiety disorders.

[15]  C. Reppold,et al.  Análise de características vocais e de aspectos psicológicos em indivíduos com transtorno obsessivo-compulsivo , 2010 .

[16]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[17]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[18]  R. Reilly,et al.  Acoustic and temporal analysis of speech: A potential biomarker for schizophrenia. , 2010, Medical engineering & physics.

[19]  Daphna Weinshall,et al.  Evidence for depression and schizophrenia in speech prosody , 2019, ExLing.

[20]  A. Gama,et al.  Laryngeal and vocal analysis in bulimic patients , 2010, Brazilian journal of otorhinolaryngology.

[21]  Dayane Domeneghini Didoné,et al.  Refluxo laringofaríngeo e bulimia nervosa: alterações vocais e larínegas , 2010 .

[22]  Joyce Ho,et al.  Perceived barriers to psychological treatments and their relationship to depression. , 2010, Journal of clinical psychology.

[23]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[24]  K. Merikangas,et al.  Lifetime prevalence of mental disorders in U.S. adolescents: results from the National Comorbidity Survey Replication--Adolescent Supplement (NCS-A). , 2010, Journal of the American Academy of Child and Adolescent Psychiatry.

[25]  E. Walker,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[26]  Pat Croskerry,et al.  Checklists to Reduce Diagnostic Errors , 2011, Academic medicine : journal of the Association of American Medical Colleges.

[27]  J. Zohar,et al.  The Neuropeptide Y (NPY)-ergic System is Associated with Behavioral Resilience to Stress Exposure in an Animal Model of Post-Traumatic Stress Disorder , 2012, Neuropsychopharmacology.

[28]  Peter Garrard,et al.  A feasibility study of speech recording using a contact microphone in patients with possible or probable Alzheimer's disease to detect and quantify repetitions in a natural setting , 2012, Alzheimer's & Dementia.

[29]  K. Rajiah,et al.  Bulimia nervosa and its relation to voice changes in young adults: A simple review of epidemiology, complications, diagnostic criteria and management , 2012, Journal of research in medical sciences : the official journal of Isfahan University of Medical Sciences.

[30]  Marc De Hert,et al.  Metabolic and cardiovascular adverse effects associated with antipsychotic drugs , 2012, Nature Reviews Endocrinology.

[31]  F. Fabbro,et al.  Linguistic production and syntactic comprehension in schizophrenia and bipolar disorder , 2012, Acta psychiatrica Scandinavica.

[32]  Jiang Li,et al.  A Voice-Based Automated System for PTSD Screening and Monitoring , 2012, MMVR.

[33]  Thomas F. Quatieri,et al.  Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity , 2012, INTERSPEECH.

[34]  Chao-Yang Lee,et al.  "The Sound of Fear": assessing vocal fundamental frequency as a physiological indicator of social anxiety disorder. , 2012, Journal of anxiety disorders.

[35]  D. Vogel,et al.  Reducing the stigma associated with seeking psychotherapy through self-affirmation. , 2013, Journal of counseling psychology.

[36]  Jeffrey F. Cohn,et al.  Detecting Depression Severity from Vocal Prosody , 2013, IEEE Transactions on Affective Computing.

[37]  Michael Wagner,et al.  Detecting depression: A comparison between spontaneous and read speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  M. H. A. Elmagd,et al.  Depression, Anxiety and Stress Scale in patients with tinnitus and hearing loss , 2014, European Archives of Oto-Rhino-Laryngology.

[39]  E. Gilboa-Schechtman,et al.  Acoustic Properties of Dominance and Request Utterances in Social Anxiety , 2013 .

[40]  Fabio Valente,et al.  The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism , 2013, INTERSPEECH.

[41]  D. A. Lewis,et al.  The initial field trials of DSM-5: new blooms and old thorns. , 2013, The American journal of psychiatry.

[42]  Anton Leuski,et al.  All Together Now - Introducing the Virtual Human Toolkit , 2013, IVA.

[43]  Emily A. Kuhl,et al.  DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses. , 2013, The American journal of psychiatry.

[44]  Thomas F. Quatieri,et al.  On the relative importance of vocal source, system, and prosody in human depression , 2013, 2013 IEEE International Conference on Body Sensor Networks.

[45]  Louis-Philippe Morency,et al.  Investigating voice quality as a speaker-independent indicator of depression and PTSD , 2013, INTERSPEECH.

[46]  E. Gilboa-Schechtman,et al.  Being “in” or “out” of the game: subjective and acoustic reactions to exclusion and popularity in social anxiety , 2014, Front. Hum. Neurosci..

[47]  Thomas F. Quatieri,et al.  Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing , 2014, AVEC '14.

[48]  Glyn Lewis,et al.  Adverse effects from antidepressant treatment: randomised controlled trial of 601 depressed individuals , 2014, Psychopharmacology.

[49]  John Kane,et al.  Phonetic feature extraction for context-sensitive glottal source processing , 2014, Speech Commun..

[50]  Markus Kächele,et al.  Inferring Depression and Affect from Application Dependent Meta Knowledge , 2014, AVEC '14.

[51]  Emily Mower Provost,et al.  Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Brita Elvevåg,et al.  Automated computerized analysis of speech in psychiatric disorders , 2014, Current opinion in psychiatry.

[53]  Vidhyasaharan Sethu,et al.  Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Kallirroi Georgila,et al.  SimSensei kiosk: a virtual human interviewer for healthcare decision support , 2014, AAMAS.

[55]  Brita Elvevåg,et al.  What do we really know about blunted vocal affect and alogia? A meta-analysis of objective assessments , 2014, Schizophrenia Research.

[56]  J. Favela,et al.  Anxiety detection using wearable monitoring , 2014, MexIHC '14.

[57]  Robert Koprowski,et al.  Machine learning, medical diagnosis, and biomedical engineering research - commentary , 2014, BioMedical Engineering OnLine.

[58]  Oscar Mayora-Ibarra,et al.  Mobile phones as medical devices in mental disorder treatment: an overview , 2014, Personal and Ubiquitous Computing.

[59]  David DeVault,et al.  The Distress Analysis Interview Corpus of human and computer interviews , 2014, LREC.

[60]  T. Insel The NIMH Research Domain Criteria (RDoC) Project: precision medicine for psychiatry. , 2014, The American journal of psychiatry.

[61]  Ronald M. Salomon,et al.  Cross-corpus depression prediction from speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[62]  M. Sigman,et al.  Automated analysis of free speech predicts psychosis onset in high-risk youths , 2015, npj Schizophrenia.

[63]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[64]  Dongmei Jiang,et al.  Multimodal depression recognition with dynamic visual and audio cues , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[65]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[66]  Vidhyasaharan Sethu,et al.  Analysis of acoustic space variability in speech affected by depression , 2015, Speech Commun..

[67]  Enzo Pasquale Scilingo,et al.  Analysis of running speech for the characterization of mood state in bipolar patients , 2015, 2015 AEIT International Annual Conference (AEIT).

[68]  Mauriceia Cassol,et al.  Measurement of Voice Quality, Anxiety and Depression Symptoms After Speech Therapy. , 2015, Journal of voice : official journal of the Voice Foundation.

[69]  REGULATION (EU) 2019/518 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL , 2015 .

[70]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[71]  Daphna Weinshall,et al.  Prosodic Analysis of Speech and the Underlying Mental State , 2015, MindCare.

[72]  Michael T. Compton,et al.  Associations of acoustically measured tongue/jaw movements and portion of time speaking with negative symptom severity in patients with schizophrenia in Italy and the United States , 2016, Psychiatry Research.

[73]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[74]  J. Rehm,et al.  The economic costs of mental disorders , 2016, EMBO reports.

[75]  Oscar Mayora-Ibarra,et al.  Classification of bipolar disorder episodes based on analysis of voice and motor activity of patients , 2016, Pervasive Mob. Comput..

[76]  H. Aizenstein,et al.  Studying depression using imaging and machine learning methods , 2015, NeuroImage: Clinical.

[77]  Bart Custers,et al.  Click here to consent forever: Expiry dates for informed consent , 2016, Big Data Soc..

[78]  Markus Kächele,et al.  The Influence of Annotation, Corpus Design, and Evaluation on the Outcome of Automatic Classification of Human Emotions , 2016, Front. ICT.

[79]  Marcia K. Johnson,et al.  Cross-trial prediction of treatment outcome in depression: a machine learning approach. , 2016, The lancet. Psychiatry.

[80]  Noam Amir,et al.  Do social anxiety individuals hesitate more? The prosodic profile of hesitation disfluencies in Social Anxiety Disorder individuals , 2016 .

[81]  Thomas F. Quatieri,et al.  Detecting Depression using Vocal, Facial and Semantic Communication Cues , 2016, AVEC@ACM Multimedia.

[82]  Emily Mower Provost,et al.  Mood state prediction from speech of varying acoustic quality for individuals with bipolar disorder , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[83]  Justin W. Weeks,et al.  “Speaking More than Words”: Classifying Men with Social Anxiety Disorder via Vocal Acoustic Analyses of Diagnostic Interviews , 2016 .

[84]  Elizabeth Shriberg,et al.  Noise and reverberation effects on depression detection from speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[85]  Pierre Zweigenbaum,et al.  Text mining applications in psychiatry: a systematic literature review , 2016, International journal of methods in psychiatric research.

[86]  S. Marmor,et al.  Voice problems and depression among adults in the United States , 2016, The Laryngoscope.

[87]  Albert A. Rizzo,et al.  Self-Reported Symptoms of Depression and PTSD Are Associated with Reduced Vowel Space in Screening Interviews , 2016, IEEE Transactions on Affective Computing.

[88]  Christopher D. Burton,et al.  Pilot randomised controlled trial of Help4Mood, an embodied virtual agent-based system to support treatment of depression , 2016, Journal of telemedicine and telecare.

[89]  J. Bardram,et al.  Voice analysis as an objective state marker in bipolar disorder , 2016, Translational psychiatry.

[90]  Michael Wagner,et al.  Cross-Cultural Depression Recognition from Vocal Biomarkers , 2016, INTERSPEECH.

[91]  A. Rajewska-Rager,et al.  The assessment of the impact of anorexia nervosa on the vocal apparatus in adolescent girls - A preliminary report. , 2016, International journal of pediatric otorhinolaryngology.

[92]  Stefan Scherer,et al.  A Cross-modal Review of Indicators for Depression Detection Systems , 2017, CLPsych@ACL.

[93]  B. Mwangi,et al.  The impact of machine learning techniques in the study of bipolar disorder: A systematic review , 2017, Neuroscience & Biobehavioral Reviews.

[94]  Chung-Hsien Wu,et al.  Coupled HMM-based multimodal fusion for mood disorder detection through elicited audio–visual signals , 2016, J. Ambient Intell. Humaniz. Comput..

[95]  Gábor Kiss,et al.  Mono- and multi-lingual depression prediction based on speech processing , 2017, International Journal of Speech Technology.

[96]  Tingshao Zhu,et al.  Identifying comorbidities from depressed people via voice analysis , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[97]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[98]  Marcos Faúndez-Zanuy,et al.  EMOTHAW: A Novel Database for Emotional State Recognition From Handwriting and Drawing , 2017, IEEE Transactions on Human-Machine Systems.

[99]  Julien Epps,et al.  Differential performance of automatic speech-based depression classification across smartphones , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW).

[100]  T. Yarkoni,et al.  Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning , 2017, Perspectives on psychological science : a journal of the Association for Psychological Science.

[101]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[102]  Karl J. Friston,et al.  Computational Nosology and Precision Psychiatry , 2017, Computational Psychiatry.

[103]  Gábor Kiss,et al.  Comparison of read and spontaneous speech in case of automatic detection of depression , 2017, 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[104]  Mitchell D. Wilkes,et al.  Evaluation of Voice Acoustics as Predictors of Clinical Depression Scores. , 2017, Journal of voice : official journal of the Voice Foundation.

[105]  Julie Prescott,et al.  Assessing the Efficacy of Mobile Health Apps Using the Basic Principles of Cognitive Behavioral Therapy: Systematic Review , 2017, Journal of medical Internet research.

[106]  Enzo Pasquale Scilingo,et al.  Features of vocal frequency contour and speech rhythm in bipolar disorder , 2017, Biomed. Signal Process. Control..

[107]  Sharath Chandra Guntuku,et al.  Detecting depression and mental illness on social media: an integrative review , 2017, Current Opinion in Behavioral Sciences.

[108]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[109]  S. Hofmann,et al.  The effect of heart rate variability biofeedback training on stress and anxiety: a meta-analysis , 2017, Psychological Medicine.

[110]  Rafael A. Calvo,et al.  Natural language processing in mental health applications using non-clinical texts† , 2017, Natural Language Engineering.

[111]  A. David Marshall,et al.  Automated Screening for Bipolar Disorder from Audio/Visual Modalities , 2018, AVEC@MM.

[112]  Dongmei Jiang,et al.  Bipolar Disorder Recognition with Histogram Features of Arousal and Body Gestures , 2018, AVEC@MM.

[113]  Jing Zhang,et al.  Analysis on speech signal features of manic patients. , 2018, Journal of psychiatric research.

[114]  A. Meyer-Lindenberg,et al.  Machine Learning for Precision Psychiatry: Opportunities and Challenges. , 2017, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[115]  Zhiwei He,et al.  Multi-modality Hierarchical Recall based on GBDTs for Bipolar Disorder Classification , 2018, AVEC@MM.

[116]  Brian A. Nosek,et al.  The preregistration revolution , 2018, Proceedings of the National Academy of Sciences.

[117]  U. Rajendra Acharya,et al.  Automated EEG-based screening of depression using deep convolutional neural network , 2018, Comput. Methods Programs Biomed..

[118]  Abeer Alwan,et al.  Effectiveness of Voice Quality Features in Detecting Depression , 2018, INTERSPEECH.

[119]  Ambuj Tewari,et al.  Just-in-Time Adaptive Interventions (JITAIs) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support , 2017, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[120]  Stefan Scherer,et al.  What type of happiness are you looking for? - A closer look at detecting mental health from language , 2018, CLPsych@NAACL-HTL.

[121]  Turgut Özseven,et al.  Voice Traces of Anxiety: Acoustic Parameters Affected by Anxiety Disorder , 2018, Archives of Acoustics.

[122]  N. Shah,et al.  Implementing Machine Learning in Health Care - Addressing Ethical Challenges. , 2018, The New England journal of medicine.

[123]  Ajmal Mian,et al.  Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey , 2018, IEEE Access.

[124]  F. Manes,et al.  Social neuroscience: undoing the schism between neurology and psychiatry , 2018, Social neuroscience.

[125]  Shinichi Tokuno,et al.  Pathophysiological Voice Analysis for Diagnosis and Monitoring of Depression , 2018 .

[126]  Fabien Ringeval,et al.  AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition , 2018, AVEC@MM.

[127]  Satrajit S. Ghosh,et al.  Distributed Weight Consolidation: A Brain Segmentation Case Study , 2018, NeurIPS.

[128]  John Read,et al.  Heterogeneity in psychiatric diagnostic classification , 2019, Psychiatry Research.

[129]  D. Thalmann,et al.  Non-verbal speech cues as objective measures for negative symptoms in patients with schizophrenia , 2019, PloS one.

[130]  Emily Mower Provost,et al.  Emotion Recognition from Natural Phone Conversations in Individuals with and without Recent Suicidal Ideation , 2019, INTERSPEECH.

[131]  Riccardo Fusaroli,et al.  Voice patterns in schizophrenia: A systematic review and Bayesian meta-analysis , 2019, Schizophrenia Research.

[132]  J. Flint,et al.  Re-examining the robustness of voice features in predicting depression: Compared with baseline of confounders , 2019, PloS one.

[133]  Jon M. Kleinberg,et al.  Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability , 2018, EC.

[134]  C. Marmar,et al.  Speech‐based markers for posttraumatic stress disorder in US veterans , 2019, Depression and anxiety.

[135]  Martin Wattenberg,et al.  TensorFlow.js: Machine Learning for the Web and Beyond , 2019, MLSys.

[136]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[137]  John P. A. Ioannidis,et al.  Exploration, Inference, and Prediction in Neuroscience and Biomedicine , 2019, Trends in Neurosciences.

[138]  Will Neuroimaging Produce a Clinical Tool for Psychiatry? , 2019, Psychiatric Annals.

[139]  Chung-Hsien Wu,et al.  Detecting Unipolar and Bipolar Depressive Disorders from Elicited Speech Responses Using Latent Affective Structure Model , 2020, IEEE Transactions on Affective Computing.