Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing

In individuals with major depressive disorder, neurophysiological changes often alter motor control and thus affect the mechanisms controlling speech production and facial expression. These changes are typically associated with psychomotor retardation, a condition marked by slowed neuromotor output that is behaviorally manifested as altered coordination and timing across multiple motor-based properties. Changes in motor outputs can be inferred from vocal acoustics and facial movements as individuals speak. We derive novel multi-scale correlation structure and timing feature sets from audio-based vocal features and video-based facial action units from recordings provided by the 4th International Audio/Video Emotion Challenge (AVEC). The feature sets enable detection of changes in coordination, movement, and timing of vocal and facial gestures that are potentially symptomatic of depression. Combining complementary features in Gaussian mixture model and extreme learning machine classifiers, our multivariate regression scheme predicts Beck depression inventory ratings on the AVEC test set with a root-mean-square error of 8.12 and mean absolute error of 6.31. Future work calls for continued study into detection of neurological disorders based on altered coordination and timing across audio and video modalities.

[1]  J. Darby,et al.  Speech and voice parameters of depression: a pilot study. , 1984, Journal of communication disorders.

[2]  Philip J. B. Jackson,et al.  Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech , 2001, IEEE Trans. Speech Audio Process..

[3]  D. Mitchell Wilkes,et al.  Investigation of vocal jitter and glottal flow spectrum as possible cues for depression and near-term suicidal risk , 2004, IEEE Transactions on Biomedical Engineering.

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[6]  W. Gaebel,et al.  Facial expression and emotional face recognition in schizophrenia and depression , 2005, European Archives of Psychiatry and Clinical Neuroscience.

[7]  Thomas F. Quatieri,et al.  Phonologically-based biomarkers for major depressive disorder , 2011, EURASIP J. Adv. Signal Process..

[8]  D. Mohr,et al.  Major depressive disorder , 2016, Nature Reviews Disease Primers.

[9]  James R. Williamson,et al.  Epileptic seizure prediction using the spatiotemporal correlation structure of intracranial EEG , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Thomas F. Quatieri,et al.  Classification of depression state based on articulatory precision , 2013, INTERSPEECH.

[11]  J. Peifer,et al.  Analysis of prosodic variation in speech for clinical depression , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[12]  B. Carroll,et al.  Psychomotor function in affective disorders: an overview of new monitoring techniques. , 1981, The American journal of psychiatry.

[13]  Maja Pantic,et al.  Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge , 2013, AVEC@ACM Multimedia.

[14]  Daniel Rudoy,et al.  Conditionally linear Gaussian models for estimating vocal tract resonances , 2007, INTERSPEECH.

[15]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality. , 1994, Journal of speech and hearing research.

[16]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[17]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[18]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[19]  G. Klerman,et al.  Facial Expression and Imagery in Depression: An Electromyographic Study , 1976, Psychosomatic medicine.

[20]  Steve Renals,et al.  A digital microphone array for distant speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Daniel Rudoy,et al.  KARMA: Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking , 2011, The Journal of the Acoustical Society of America.

[22]  J. Mundt,et al.  Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology , 2007, Journal of Neurolinguistics.

[23]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  James R. Williamson,et al.  Seizure prediction using EEG spatiotemporal correlation structure , 2012, Epilepsy & Behavior.

[25]  Thomas F. Quatieri,et al.  Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity , 2012, INTERSPEECH.

[26]  J. Hillenbrand,et al.  Cepstral Peak Prominence: A More Reliable Measure of Dysphonia , 2003, The Annals of otology, rhinology, and laryngology.

[27]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[28]  P. Van cauwenberge,et al.  Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. , 2010, Journal of voice : official journal of the Voice Foundation.

[29]  Thomas F. Quatieri,et al.  Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBI , 2014, INTERSPEECH.

[30]  Douglas E. Sturim,et al.  Automatic Detection of Depression in Speech Using Gaussian Mixture Modeling with Factor Analysis , 2011, INTERSPEECH.

[31]  John S. Thompson,et al.  Compressive power spectral density estimation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  P. Wolfe,et al.  Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking a ) , 2011 .

[33]  P. Ekman,et al.  Facial signs of emotional experience. , 1980 .

[34]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[35]  J Lebacq,et al.  Acoustic, perceptual, aerodynamic and anatomical correlations in voice pathology. , 1996, ORL; journal for oto-rhino-laryngology and its related specialties.

[36]  Y. Heman-Ackah,et al.  The relationship between cepstral peak prominence and selected parameters of dysphonia. , 2002, Journal of voice : official journal of the Voice Foundation.

[37]  Gwen Littlewort,et al.  The computer expression recognition toolbox (CERT) , 2011, Face and Gesture 2011.

[38]  Philip J. B. Jackson,et al.  Performance of the pitch-scaled harmonic filter and applications in speech analysis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[39]  C H Shadle,et al.  Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. , 2000, The Journal of the Acoustical Society of America.

[40]  Timothy J. Hazen,et al.  A comparison of query-by-example methods for spoken term detection , 2009, INTERSPEECH.

[41]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[42]  Nicholas B. Allen,et al.  Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  Thomas F. Quatieri,et al.  Prediction of cognitive performance in an animal fluency task based on rate and articulatory markers , 2014, INTERSPEECH.

[44]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.