Comparison of read and spontaneous speech in case of automatic detection of depression

In this paper, read and spontaneous speech have been compared in the light of automatic depression detection by speech processing. First, statistical analysis was carried out to select those acoustic features that differ significantly between healthy and depressed subjects in case of these two types of speech, separately for both gender. Secondly, statistical examination and classification experiments were prepared to compare the values of the selected features for the two types of speech. We were looking for the answer to which type of speech can be used to achieve better automatic depression detection results. As it was expected, the tempo related features, such as articulation rate, speech rate, and pause lengths are useful in case of spontaneous speech, while formants trajectories can be used only in case of read speech, because their values are mainly influenced by the linguistic content of the speech. Despite the significant differences of the features' values between read and spontaneous speech, there were no major differences in the detection accuracies. 83% detection accuracy was archived with read speech samples, and 86%detection accuracy was achieved with spontaneous speech samples.

[1]  Zhenyu Liu,et al.  Detection of depression in speech , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[2]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[3]  P. Baranyi,et al.  Definition and synergies of cognitive infocommunications , 2012 .

[4]  Mária Gósy Fonetika, a beszéd tudománya , 2004 .

[5]  Anna Esposito,et al.  Language Independent Detection Possibilities of Depression by Speech , 2016, Recent Advances in Nonlinear Speech Processing.

[6]  Tanaya Guha,et al.  Multimodal Prediction of Affective Dimensions and Depression in Human-Computer Interactions , 2014, AVEC '14.

[7]  Michael Wagner,et al.  Detecting depression: A comparison between spontaneous and read speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Michael Wagner,et al.  Cross-Cultural Depression Recognition from Vocal Biomarkers , 2016, INTERSPEECH.

[9]  Tibor Fegyó,et al.  Improved recognition of Hungarian call center conversations , 2013, 2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD).

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[12]  Vidhyasaharan Sethu,et al.  Analysis of acoustic space variability in speech affected by depression , 2015, Speech Commun..

[13]  Gábor Kiss,et al.  Physiological and Cognitive Status Monitoring on the Base of Acoustic-Phonetic Speech Parameters , 2014, SLSP.

[14]  Klara Vicsi,et al.  Language independent automatic speech segmentation into phoneme-like units on the base of acoustic distinctive features , 2013, 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom).

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Hugo Jair Escalante,et al.  Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge , 2014, AVEC '14.

[17]  Tekla Etelka Gráczi,et al.  The realisation of voicing assimilation rules in Hungarian spontaneous and read speech: Case studies , 2010 .

[18]  Wolfgang Minker,et al.  Emotion Recognition and Depression Diagnosis by Acoustic and Visual Features: A Multimodal Approach , 2014, AVEC '14.