An Investigation of Emotional Speech in Depression Classification

Assessing depression via speech characteristics is a growing area of interest in quantitative mental health research with a view to a clinical mental health assessment tool. As a mood disorder, depression induces changes in response to emotional stimuli, which motivates this investigation into the relationship between emotion and depression affected speech. This paper investigates how emotional information expressed in speech (i.e. arousal, valence, dominance) contributes to the classification of minimally depressed and moderately-severely depressed individuals. Experiments based on a subset of the AVEC 2014 database show that manual emotion ratings alone are discriminative of depression and combining rating-based emotion features with acoustic features improves classification between mild and severe depression. Emotion-based data selection is also shown to provide improvements in depression classification and a range of threshold methods are explored. Finally, the experiments presented demonstrate that automatically predicted emotion ratings can be incorporated into a fully automatic depression classification to produce a 5% accuracy improvement over an acoustic-only baseline system.

[1]  M. Bradley,et al.  Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. , 1994, Journal of behavior therapy and experimental psychiatry.

[2]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[3]  Fabien Ringeval,et al.  AVEC 2015: The 5th International Audio/Visual Emotion Challenge and Workshop , 2015, ACM Multimedia.

[4]  Dongmei Jiang,et al.  Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[5]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[6]  Carmen García-Mateo,et al.  A study of acoustic features for the classification of depressed speech , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[7]  M. Bradley,et al.  Looking at pictures: affective, facial, visceral, and behavioral reactions. , 1993, Psychophysiology.

[8]  Vidhyasaharan Sethu,et al.  Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Vidhyasaharan Sethu,et al.  Speech Based Emotion Recognition , 2015 .

[10]  Nicholas B. Allen,et al.  Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  A. Beck,et al.  Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. , 1996, Journal of personality assessment.

[12]  W. Stiles,et al.  Submissive voices dominate in depression: assimilation analysis of a helpful session. , 2007, Journal of clinical psychology.

[13]  Eliathamby Ambikairajah,et al.  Spectro-temporal analysis of speech affected by depression and psychomotor retardation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Judith A. Hall,et al.  Nonverbal behavior in clinician—patient interaction , 1995 .

[15]  Shrikanth S. Narayanan,et al.  An analysis of the relationship between signal-derived vocal arousal score and human emotion production and perception , 2015, INTERSPEECH.

[16]  Shrikanth S. Narayanan,et al.  Robust Unsupervised Arousal Rating:A Rule-Based Framework withKnowledge-Inspired Vocal Features , 2014, IEEE Transactions on Affective Computing.

[17]  L. F. Barrett Discrete Emotions or Dimensions? The Role of Valence Focus and Arousal Focus , 1998 .

[18]  A. Tellegen Structures of Mood and Personality and Their Relevance to Assessing Anxiety, With an Emphasis on Self-Report , 2019, Anxiety and the Anxiety Disorders.

[19]  Jan Larsen,et al.  Combining semantic and acoustic features for valence and arousal recognition in speech , 2012, 2012 3rd International Workshop on Cognitive Information Processing (CIP).

[20]  Louis-Philippe Morency,et al.  Reduced vowel space is a robust indicator of psychological distress: A cross-corpus analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Ting Dang,et al.  An Investigation of Annotation Delay Compensation and Output-Associative Fusion for Multimodal Continuous Emotion Prediction , 2015, AVEC@ACM Multimedia.

[22]  Ian H. Gotlib,et al.  Emotion regulation in depression: Relation to cognitive inhibition , 2010, Cognition & emotion.

[23]  Hugo Jair Escalante,et al.  Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge , 2014, AVEC '14.

[24]  Roland Göcke,et al.  Modeling spectral variability for the classification of depressed speech , 2013, INTERSPEECH.

[25]  Thomas F. Quatieri,et al.  Classification of depression state based on articulatory precision , 2013, INTERSPEECH.

[26]  Michael Wagner,et al.  Detecting depression: A comparison between spontaneous and read speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.