Analysis of acoustic space variability in speech affected by depression

Present novel probabilistic acoustic volume, a robust acoustic variability measure.As depression increases phonetic events become concentrated in acoustic space.MFCC feature space becomes tightly concentrated with increasing depression.Speech trajectory in acoustic space becomes smoother with increasing depression.Choice of speech collection paradigm may adversely affect depression detection. The spectral and energy properties of speech have consistently been observed to change with a speaker's level of clinical depression. This has resulted in spectral and energy based features being a key component in many speech-based classification and prediction systems. However there has been no in-depth investigation into understanding how acoustic models of spectral features are affected by depression. This paper investigates the hypothesis that the effects of depression in speech manifest as a reduction in the spread of phonetic events in acoustic space as modelled by Gaussian Mixture Models (GMM) in combination with Mel Frequency Cepstral Coefficients (MFCC). Our investigation uses three measures of acoustic variability: Average Weighted Variance (AWV), Acoustic Movement (AM) and Acoustic Volume, which attempt to model depression specific acoustic variations (AWV and Acoustic Volume), or the trajectory of a speech in the acoustic space (AM). Within our analysis we present the Probabilistic Acoustic Volume (PAV) a novel method for robustly estimating Acoustic Volume using a Monte Carlo sampling of the feature distribution being modelled. We show that using an array of PAV points we gain insights into how the concentration of the feature vectors in the feature space changes with depression. Key results - found on two commonly used depression corpora - consistently indicate that as a speaker's level of depression increases there are statistically significantly reductions in both AWV (-0.44≤rs≤-0.18 with p<.05) and AM (-0.26≤rs≤-0.19 with p<.05) values, indicating a decrease in localised acoustic variance and smoothing in acoustic trajectory respectively. Further there are also statistically significant reductions (-0.32≤rs≤-0.20 with p<.05) in Acoustic Volume measures and strong statistical evidence (-0.48≤rs≤-0.23 with p<.05) that the MFCC feature space becomes more concentrated. Quantifying these effects is expected to be a key step towards building an objective classification or prediction system which is robust to many of the unwanted - in terms of depression analysis - sources of variability modulated into a speech signal.

[1]  Albert A. Rizzo,et al.  Automatic audiovisual behavior descriptors for psychological disorder analysis , 2014, Image Vis. Comput..

[2]  Vidhyasaharan Sethu,et al.  Probabilistic acoustic volume analysis for speech affected by depression , 2014, INTERSPEECH.

[3]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[4]  Gary Christopher,et al.  The impact of clinical depression on working memory , 2005, Cognitive neuropsychiatry.

[5]  Elmar Nöth,et al.  Automatic modelling of depressed speech: relevant features and relevance of gender , 2014, INTERSPEECH.

[6]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[7]  Olga V. Demler,et al.  The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). , 2003, JAMA.

[8]  M. Alpert,et al.  Reflections of depression in acoustic measures of the patient's speech. , 2001, Journal of affective disorders.

[9]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Michael Wagner,et al.  Multimodal assistive technologies for depression diagnosis and monitoring , 2013, Journal on Multimodal User Interfaces.

[11]  Raymond D. Kent Research on speech motor control and its disorders: a review and prospective. , 2000, Journal of communication disorders.

[12]  Mohammad H. Mahoor,et al.  Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses , 2014, Image Vis. Comput..

[13]  Eliathamby Ambikairajah,et al.  Using clustering comparison measures for speaker recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Jesús Francisco Vargas-Bonilla,et al.  Phonation and Articulation Analysis of Spanish Vowels for Automatic Detection of Parkinson's Disease , 2014, TSD.

[15]  D Hell,et al.  The speech analysis approach to determining onset of improvement under antidepressants , 1998, European Neuropsychopharmacology.

[16]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[17]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  T. B. Üstün,et al.  Global burden of depressive disorders in the year 2000 , 2004, British Journal of Psychiatry.

[19]  D DeBrota,et al.  The responsiveness of the Hamilton Depression Rating Scale. , 2000, Journal of psychiatric research.

[20]  Nicholas B. Allen,et al.  Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Björn W. Schuller,et al.  CCA based feature selection with application to continuous depression recognition from acoustic speech features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  P. Bech,et al.  The heterogeneity of the depressive syndrome: when numbers get serious , 2011, Acta psychiatrica Scandinavica.

[23]  M. Swerts,et al.  Verbal and Nonverbal Correlates for Depression: A Review , 2012 .

[24]  Kris Tjaden,et al.  Acoustic and perceptual consequences of articulatory rate change in Parkinson disease. , 2002, Journal of speech, language, and hearing research : JSLHR.

[25]  E. Gruenberg The prevention of mental disorders. , 1959, Journal of chronic diseases.

[26]  S. Skodda,et al.  Impairment of Vowel Articulation as a Possible Marker of Disease Progression in Parkinson's Disease , 2012, PloS one.

[27]  S. Skodda,et al.  Vowel articulation in Parkinson's disease. , 2011, Journal of voice : official journal of the Voice Foundation.

[28]  Manuel Mazo,et al.  Modeling and correction of multipath interference in time of flight cameras , 2014, Image Vis. Comput..

[29]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[30]  Louis-Philippe Morency,et al.  Investigating voice quality as a speaker-independent indicator of depression and PTSD , 2013, INTERSPEECH.

[31]  Robert T. Wertz,et al.  Apraxia of speech in adults: The disorder and its management , 1984 .

[32]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[33]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[34]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[35]  M. Thase,et al.  Psychiatric rating scales. , 2012, Handbook of clinical neurology.

[36]  Eliathamby Ambikairajah,et al.  Spectro-temporal analysis of speech affected by depression and psychomotor retardation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  John H. L. Hansen,et al.  Babble Noise: Modeling, Analysis, and Applications , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  A. Mitchell,et al.  Clinical diagnosis of depression in primary care: a meta-analysis , 2009, The Lancet.

[39]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[40]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[41]  Edward T. Bullmore,et al.  Plasma Protein Biomarkers for Depression and Schizophrenia by Multi Analyte Profiling of Case-Control Collections , 2010, PloS one.

[42]  Thomas F. Quatieri,et al.  On the relative importance of vocal source, system, and prosody in human depression , 2013, 2013 IEEE International Conference on Body Sensor Networks.

[43]  Thomas F. Quatieri,et al.  Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity , 2012, INTERSPEECH.

[44]  Jerome L. Myers,et al.  Research Design and Statistical Analysis , 1991 .

[45]  Vidhyasaharan Sethu,et al.  Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Lee Baer,et al.  Handbook of Clinical Rating Scales and Assessment in Psychiatry and Mental Health , 2010, Current Clinical Psychiatry.

[47]  H H Stassen,et al.  Speaking behavior and voice sound characteristics in depressive patients during recovery. , 1993, Journal of psychiatric research.

[48]  Roland Göcke,et al.  Modeling spectral variability for the classification of depressed speech , 2013, INTERSPEECH.

[49]  Masaru Mimura,et al.  Coping strategies for antidepressant side effects: an Internet survey. , 2012, Journal of affective disorders.

[50]  Thomas F. Quatieri,et al.  Classification of depression state based on articulatory precision , 2013, INTERSPEECH.

[51]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[52]  Roland Göcke,et al.  Diagnosis of depression by behavioural signals: a multimodal approach , 2013, AVEC@ACM Multimedia.

[53]  J P Watson,et al.  Effects of communication content on speech behavior of depressives. , 1992, Comprehensive psychiatry.

[54]  Janet B W Williams,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[55]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[56]  Raymond D. Kent,et al.  Toward an acoustic typology of motor speech disorders , 2003, Clinical linguistics & phonetics.

[57]  M. Caligiuri,et al.  Motor and cognitive aspects of motor retardation in depression. , 2000, Journal of affective disorders.

[58]  A. Flint,et al.  Abnormal speech articulation, psychomotor retardation, and subcortical dysfunction in major depression. , 1993, Journal of psychiatric research.

[59]  Klaus R. Scherer,et al.  Vocal indicators of mood change in depression , 1996 .

[60]  Douglas E. Sturim,et al.  Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis , 2011, INTERSPEECH.

[61]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[62]  Michael Wagner,et al.  From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech , 2012, FLAIRS.

[63]  Lee Baer,et al.  Understanding Rating Scales and Assessment Instruments , 2009 .

[64]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[65]  Thomas F. Quatieri,et al.  Phonologically-based biomarkers for major depressive disorder , 2011, EURASIP J. Adv. Signal Process..

[66]  R. de Raedt,et al.  Deficient inhibition of emotional information in depression. , 2006, Journal of affective disorders.

[67]  A. Steiger,et al.  Wake and sleep EEG provide biomarkers in depression. , 2010, Journal of psychiatric research.

[68]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[69]  Vidhyasaharan Sethu,et al.  Speech Based Emotion Recognition , 2015 .

[70]  John H. L. Hansen,et al.  A Study on Universal Background Model Training in Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[71]  Cumhur Erkut,et al.  Real-Time Recognition of Percussive Sounds by a Model-Based Method , 2011, EURASIP J. Adv. Signal Process..

[72]  J. Mundt,et al.  Vocal Acoustic Biomarkers of Depression Severity and Treatment Response , 2012, Biological Psychiatry.

[73]  H. Sackeim,et al.  Psychomotor symptoms of depression. , 1997, The American journal of psychiatry.

[74]  J. Mundt,et al.  Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology , 2007, Journal of Neurolinguistics.

[75]  Elmar Nöth,et al.  Automatic evaluation of parkinson's speech - acoustic, prosodic and voice related cues , 2013, INTERSPEECH.

[76]  J. Olesen,et al.  The economic cost of brain disorders in Europe , 2012, European journal of neurology.

[77]  Ronald S Duman,et al.  Functional Biomarkers of Depression: Diagnosis, Treatment, and Pathophysiology , 2011, Neuropsychopharmacology.

[78]  Louis-Philippe Morency,et al.  Reduced vowel space is a robust indicator of psychological distress: A cross-corpus analysis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[79]  Jutta Joormann,et al.  Updating the contents of working memory in depression: interference from irrelevant negative material. , 2008, Journal of abnormal psychology.

[80]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[81]  Huaiyu Yang,et al.  Rating Scales for Depression , 2009 .

[82]  R. Rosenthal Parametric measures of effect size. , 1994 .

[83]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[84]  Glyn Lewis,et al.  Adverse effects from antidepressant treatment: randomised controlled trial of 601 depressed individuals , 2014, Psychopharmacology.

[85]  Peter Pocta,et al.  Subjective and objective measurement of synthesized speech intelligibility in modern telephone conditions , 2015, Speech Commun..

[86]  Jerome L. Myers,et al.  Research Design and Statistical Analysis: Third Edition , 1991 .

[87]  William M. Campbell A covariance kernel for svm language recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[88]  Judith A. Hall,et al.  Nonverbal behavior in clinician—patient interaction , 1995 .

[89]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[90]  J Sundberg,et al.  Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression. , 1988, The Journal of the Acoustical Society of America.

[91]  Jennifer L. Spielman,et al.  Formant centralization ratio: a proposal for a new acoustic measure of dysarthric speech. , 2010, Journal of speech, language, and hearing research : JSLHR.

[92]  H H Stassen,et al.  Speaking behavior and voice sound characteristics associated with negative schizophrenia. , 1995, Journal of psychiatric research.

[93]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[94]  H H Stassen,et al.  Speech characteristics in depression. , 1991, Psychopathology.

[95]  Thomas F. Quatieri,et al.  Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing , 2014, AVEC '14.

[96]  A. Calev,et al.  Retrieval from semantic memory using meaningful and meaningless constructs by depressed, stable bipolar and manic patients. , 1989, The British journal of clinical psychology.