From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech

Depression and other mood disorders are common and disabling disorders. We present work towards an objective diagnostic aid supporting clinicians using affective sensing technology with a focus on acoustic and statistical features from spontaneous speech. This work investigates differences in expressing positive and negative emotions in depressed and healthy control subjects as well as whether initial gender classification increases the recognition rate. To this end, spontaneous speech from interviews of 30 subjects of each depressed and controls was analysed, with a focus on questions eliciting positive and negative emotions. Using HMMs with GMMs for classification with 30-fold cross-validation, we found that MFCC, energy and intensity features gave highest recognition rates when female and male subjects were analysed together. When the dataset was first split by gender, root mean square energy and shimmer features, respectively, were found to give the highest recognition rates in females, while it was voice quality for males. Overall, correct recognition rates from acoustic features for depressed female subjects were higher than for male subjects. Using temporal features, we found that the response time and average syllable duration were longer in depressed subjects, while the interaction involvement and articulation rate wesre higher in control subjects.

[1]  Kai Yu,et al.  Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Frieda Goldman Eisler Psycholinguistics : experiments in spontaneous speech , 1968 .

[3]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[4]  Christian Müller Speaker Classification II, Selected Projects , 2007, Speaker Classification.

[5]  Roland Göcke,et al.  An approach for automatically measuring facial activity in depressed subjects , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[6]  F. McNair Understanding depression. , 1981, Canadian family physician Medecin de famille canadien.

[7]  P. Ekman,et al.  The nature of emotion: Fundamental questions. , 1994 .

[8]  Elisabeth André,et al.  Improving Automatic Emotion Recognition from Speech via Gender Differentiaion , 2006, LREC.

[9]  W. Cullen,et al.  Research confuses me: what is the difference between case-control and cohort studies in quantitative research? , 2013, Irish medical journal.

[10]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[11]  Roland Göcke,et al.  Iterative Error Bound Minimisation for AAM Alignment , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Roland Göcke,et al.  Towards Affective Sensing , 2007, HCI.

[13]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[14]  Elliot Moore,et al.  Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech , 2008, IEEE Transactions on Biomedical Engineering.

[15]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[16]  Klaus R. Scherer,et al.  Vocal indicators of mood change in depression , 1996 .

[17]  Nicholas B. Allen,et al.  Mel frequency cepstral feature and Gaussian Mixtures for modeling clinical depression in adolescents , 2009, 2009 8th IEEE International Conference on Cognitive Informatics.

[18]  J. Peifer,et al.  Comparing objective feature statistics of speech for classifying clinical depression , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[19]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[20]  Anton Batliner,et al.  Speaker Characteristics and Emotion Classification , 2007, Speaker Classification.

[21]  Adena J. Zlochower,et al.  Vocal timing in face-to-face interaction of clinically depressed and nondepressed mothers and their 4-month-old infants , 1996 .

[22]  Ava T. Albrecht,et al.  100 Questions & Answers About Depression , 2005 .

[23]  A. W. Siegman,et al.  Anxiety and depression in speech. , 1970, Journal of consulting and clinical psychology.

[24]  Loïc Kessous,et al.  The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.

[25]  J. Darby,et al.  Speech and voice parameters of depression: a pilot study. , 1984, Journal of communication disorders.

[26]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[27]  D. Mitchell Wilkes,et al.  Analysis of fundamental frequency for near term suicidal risk assessment , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[28]  S. Nolen-Hoeksema,et al.  Sex Differences in Unipolar Depression: Evidence and Theory Background on the Affective Disorders , 1987 .

[29]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..