Modeling spectral variability for the classification of depressed speech

Quantifying how the spectral content of speech relates to changes in mental state may be crucial in building an objective speech-based depression classification system with clinical utility. This paper investigates the hypothesis that important depression based information can be captured within the covariance structure of a Gaussian Mixture Model (GMM) of recorded speech. Significant negative correlations found between a speaker’s average weighted variance a GMMbased indicator of speaker variability and their level of depression support this hypothesis. Further evidence is provided by the comparison of classification accuracies from seven different GMM-UBM systems, each formed by varying different parameter combinations during MAP adaption. This analysis shows that variance-only adaptation either outperforms or matches the de facto standard mean-only adaptation when classifying both the presence and severity of depression. This result is perhaps the first of its kind seen in GMM-UBM speech classification.

[1]  Michael Wagner,et al.  From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech , 2012, FLAIRS.

[2]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[3]  William M. Campbell A covariance kernel for svm language recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Nicholas Cummins,et al.  A Comparison of Classification Paradigms for Speaker Likeability Determination , 2012, INTERSPEECH.

[5]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[6]  Nicholas B. Allen,et al.  Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Thomas F. Quatieri,et al.  Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity , 2012, INTERSPEECH.

[8]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  Shrikanth S. Narayanan,et al.  Automatic speaker age and gender recognition using acoustic and prosodic level information fusion , 2013, Comput. Speech Lang..

[11]  J. Markowitz,et al.  The 16-Item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression , 2003, Biological Psychiatry.

[12]  Margaret Lech,et al.  Effect of Clinical Depression on Automatic Speaker Identification , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[13]  Roland Göcke,et al.  An approach for automatically measuring facial activity in depressed subjects , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[14]  M. Thase,et al.  Psychiatric rating scales. , 2012, Handbook of clinical neurology.

[15]  Eliathamby Ambikairajah,et al.  Spectro-temporal analysis of speech affected by depression and psychomotor retardation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Shrikanth S. Narayanan,et al.  Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors , 2014, Comput. Speech Lang..

[17]  D. Mitchell Wilkes,et al.  Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk , 2004, IEEE Transactions on Biomedical Engineering.

[18]  M. Alpert,et al.  Reflections of depression in acoustic measures of the patient's speech. , 2001, Journal of affective disorders.

[19]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[20]  Eliathamby Ambikairajah,et al.  Using clustering comparison measures for speaker recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Björn W. Schuller,et al.  Medium-term speaker states - A review on intoxication, sleepiness and the first challenge , 2014, Comput. Speech Lang..

[22]  Robert T. Wertz,et al.  Apraxia of speech in adults: The disorder and its management , 1984 .

[23]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[24]  M. Swerts,et al.  Verbal and Nonverbal Correlates for Depression: A Review , 2012 .

[25]  J. Mundt,et al.  Vocal Acoustic Biomarkers of Depression Severity and Treatment Response , 2012, Biological Psychiatry.

[26]  H. Sackeim,et al.  Psychomotor symptoms of depression. , 1997, The American journal of psychiatry.

[27]  J. Mundt,et al.  Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology , 2007, Journal of Neurolinguistics.

[28]  Douglas E. Sturim,et al.  Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis , 2011, INTERSPEECH.

[29]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[30]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[31]  Thomas F. Quatieri,et al.  Phonologically-based biomarkers for major depressive disorder , 2011, EURASIP J. Adv. Signal Process..

[32]  John H. L. Hansen,et al.  A Study on Universal Background Model Training in Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.