A comparative study of different classifiers for detecting depression from spontaneous speech

Accurate detection of depression from spontaneous speech could lead to an objective diagnostic aid to assist clinicians to better diagnose depression. Little thought has been given so far to which classifier performs best for this task. In this study, using a 60-subject real-world clinically validated dataset, we compare three popular classifiers from the affective computing literature - Gaussian Mixture Models (GMM), Support Vector Machines (SVM) and Multilayer Perceptron neural networks (MLP) - as well as the recently proposed Hierarchical Fuzzy Signature (HFS) classifier. Among these, a hybrid classifier using GMM models and SVM gave the best overall classification results. Comparing feature, score, and decision fusion, score fusion performed better for GMM, HFS and MLP, while decision fusion worked best for SVM (both for raw data and GMM models). Feature fusion performed worse than other fusion methods in this study. We found that loudness, root mean square, and intensity were the voice features that performed best to detect depression in this dataset.

[1]  Michael Wagner,et al.  From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech , 2012, FLAIRS.

[2]  António J. S. Teixeira,et al.  Voice Quality of European Portuguese Emotional Speech , 2010, PROPOR.

[3]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Tamás D. Gedeon,et al.  A comparison: Fuzzy signatures and Choquet Integral , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[5]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Elliot Moore,et al.  Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech , 2008, IEEE Transactions on Biomedical Engineering.

[7]  A. Flint,et al.  Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression. , 1993, Journal of psychiatric research.

[8]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[9]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[10]  D. Mitchell Wilkes,et al.  Analysis of fundamental frequency for near term suicidal risk assessment , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[11]  Kai Yu,et al.  Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  László T. Kóczy,et al.  Mamdani-type inference in fuzzy signature based rule bases , 2007 .

[13]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[14]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[15]  J. Peifer,et al.  Comparing objective feature statistics of speech for classifying clinical depression , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[16]  Shashidhar G. Koolagudi,et al.  Emotion recognition from speech: a review , 2012, International Journal of Speech Technology.

[17]  László T. Kóczy,et al.  Learning Generalized Weighted Relevance Aggregation Operators Using Levenberg-Marquardt Method , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[18]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[19]  S. Ramakrishnan Recognition of Emotion from Speech: A Review , 2012 .

[20]  J. Mundt,et al.  Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology , 2007, Journal of Neurolinguistics.

[21]  Tim Polzehl,et al.  Anger recognition in speech using acoustic and linguistic cues , 2011, Speech Commun..

[22]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[23]  Barbara J. Bowers,et al.  The Recognition of Depression: The Primary Care Clinician’s Perspective , 2005, The Annals of Family Medicine.

[24]  Samir Ben Ahmed,et al.  Hierarchical Fuzzy Signatures approach for a piloted quality management system , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[25]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..