A study of acoustic features for the classification of depressed speech

Soft biometrics comprises the biological traits that are not sufficient for person authentication but can help to narrow the search space. Evidence of mental health state can be considered as a soft biometric, as it provides valuable information about the identity of an individual. Different approaches have been used for the automatic classification of speech in “depressed” or “non-depressed”, but the differences in algorithms, features, databases and performance measures make it difficult to draw conclusions about which features and techniques are more suitable for this task. In this work, the performance of different acoustic features for classification of depression in speech was studied in the framework of the audiovisual emotion challenge (AVEC 2013). To do so, an approach in which the audio data is segmented and projected into a total variability subspace was used, and these projected data was used to estimate the depression level by cosine distance scoring and majority voting.

[1]  Anil K. Jain,et al.  Biometrics of Next Generation: An Overview , 2010 .

[2]  Sadiye Guler,et al.  Automated person categorization for video surveillance using soft biometrics , 2010, Defense + Commercial Sensing.

[3]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[4]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[5]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[6]  Douglas E. Sturim,et al.  Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis , 2011, INTERSPEECH.

[7]  Carmen García-Mateo,et al.  A study of acoustic features for depression detection , 2014, 2nd International Workshop on Biometrics and Forensics.

[8]  A. Beck,et al.  Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. , 1996, Journal of personality assessment.

[9]  Roland Göcke,et al.  Diagnosis of depression by behavioural signals: a multimodal approach , 2013, AVEC@ACM Multimedia.

[10]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[11]  P. Lees-Haley,et al.  The ability of naive subjects to report symptoms of mild brain injury, post-traumatic stress disorder, major depression, and generalized anxiety disorder. , 1994, Journal of clinical psychology.

[12]  Roland Göcke,et al.  Modeling spectral variability for the classification of depressed speech , 2013, INTERSPEECH.

[13]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[14]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[16]  Patrick Kenny,et al.  New cosine similarity scorings to implement gender-independent speaker verification , 2013, INTERSPEECH.

[17]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[18]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[19]  Anil K. Jain,et al.  Soft Biometric Traits for Personal Recognition Systems , 2004, ICBA.

[20]  Enrique Argones-Rúa,et al.  Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex , 2013, AVEC@ACM Multimedia.

[21]  Nicholas B. Allen,et al.  Mel frequency cepstral feature and Gaussian Mixtures for modeling clinical depression in adolescents , 2009, 2009 8th IEEE International Conference on Cognitive Informatics.

[22]  Haizhou Li,et al.  ALIZE 3.0 - open source toolkit for state-of-the-art speaker recognition , 2013, INTERSPEECH.

[23]  Thomas F. Quatieri,et al.  On the relative importance of vocal source, system, and prosody in human depression , 2013, 2013 IEEE International Conference on Body Sensor Networks.

[24]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.