Characterising depressed speech for classification

Depression is a serious psychiatric disorder that affects mood, thoughts, and the ability to function in everyday life. This paper investigates the characteristics of depressed speech for the purpose of automatic classification by analysing the effect of different speech features on the classification results. We analysed voiced, unvoiced and mixed speech in order to gain a better understanding of depressed speech and to bridge the gap between physiological and affective computing studies. This understanding may ultimately lead to an objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. The characteristics of depressed speech were statistically analysed using ANOVA and linked to their classification results using GMM and SVM. Features were extracted and classified over speech utterances of 30 clinically depressed patients against 30 controls (both gender-matched) in a speaker-independent manner. Most feature classification results were consistent with their statistical characteristics, providing a link between physiological and affective computing studies. The classification results from low-level features were slightly better than the statistical functional features, which indicates a loss of information in the latter. We found that both mixed and unvoiced speech were as useful in detecting depression as voiced speech, if not better.

[1]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[2]  Barbara J. Bowers,et al.  The Recognition of Depression: The Primary Care Clinician’s Perspective , 2005, The Annals of Family Medicine.

[3]  Elliot Moore,et al.  Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech , 2008, IEEE Transactions on Biomedical Engineering.

[4]  Michael Wagner,et al.  From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech , 2012, FLAIRS.

[5]  Tamás D. Gedeon,et al.  A comparative study of different classifiers for detecting depression from spontaneous speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  J. Mundt,et al.  Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology , 2007, Journal of Neurolinguistics.

[7]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[8]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[9]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[10]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[11]  J. Peifer,et al.  Comparing objective feature statistics of speech for classifying clinical depression , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[12]  António J. S. Teixeira,et al.  Voice Quality of European Portuguese Emotional Speech , 2010, PROPOR.

[13]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Å. Nilsonne Speech characteristics as indicators of depressive illness , 1988, Acta psychiatrica Scandinavica.

[15]  A. Flint,et al.  Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression. , 1993, Journal of psychiatric research.

[16]  Klaus R. Scherer,et al.  Vocal indicators of mood change in depression , 1996 .

[17]  John H. L. Hansen,et al.  Classification of speech under stress based on features derived from the nonlinear Teager energy operator , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[19]  D. Mitchell Wilkes,et al.  Analysis of fundamental frequency for near term suicidal risk assessment , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[20]  Michael Wagner,et al.  Detecting depression: A comparison between spontaneous and read speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[22]  D. Mitchell Wilkes,et al.  Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk , 2004, IEEE Transactions on Biomedical Engineering.

[23]  Nicholas B. Allen,et al.  Early prediction of major depression in adolescents using glottal wave characteristics and Teager Energy parameters , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[25]  H H Stassen,et al.  Speaking behavior and voice sound characteristics in depressive patients during recovery. , 1993, Journal of psychiatric research.