Detecting Unipolar and Bipolar Depressive Disorders from Elicited Speech Responses Using Latent Affective Structure Model

Mood disorders, including unipolar depression (UD) and bipolar disorder (BD) [1] , are reported to be one of the most common mental illnesses in recent years. In diagnostic evaluation on the outpatients with mood disorder, a large portion of BD patients are initially misdiagnosed as having UD [2] . As most previous research focused on long-term monitoring of mood disorders, short-term detection which could be used in early detection and intervention is thus desirable. This work proposes an approach to short-term detection of mood disorder based on the patterns in emotion of elicited speech responses. To the best of our knowledge, there is no database for short-term detection on the discrimination between BD and UD currently. This work collected two databases containing an emotional database (MHMC-EM) collected by the Multimedia Human Machine Communication (MHMC) lab and a mood disorder database (CHI-MEI) collected by the CHI-MEI Medical Center, Taiwan. As the collected CHI-MEI mood disorder database is quite small and emotion annotation is difficult, the MHMC-EM emotional database is selected as a reference database for data adaptation. For the CHI-MEI mood disorder data collection, six eliciting emotional videos are selected and used to elicit the participants’ emotions. After watching each of the six eliciting emotional video clips, the participants answer the questions raised by the clinician. The speech responses are then used to construct the CHI-MEI mood disorder database. Hierarchical spectral clustering is used to adapt the collected MHMC-EM emotional database to fit the CHI-MEI mood disorder database for dealing with the data bias problem. The adapted MHMC-EM emotional data are then fed to a denoising autoencoder for bottleneck feature extraction. The bottleneck features are used to construct a long short term memory (LSTM)-based emotion detector for generation of emotion profiles from each speech response. The emotion profiles are then clustered into emotion codewords using the K-means algorithm. Finally, a class-specific latent affective structure model (LASM) is proposed to model the structural relationships among the emotion codewords with respect to six emotional videos for mood disorder detection. Leave-one-group-out cross validation scheme was employed for the evaluation of the proposed class-specific LASM-based approaches. Experimental results show that the proposed class-specific LASM-based method achieved an accuracy of 73.33 percent for mood disorder detection, outperforming the classifiers based on SVM and LSTM.

[1]  Björn W. Schuller,et al.  Incremental acoustic valence recognition: an inter-corpus perspective on features, matching, and performance in a gating paradigm , 2010, INTERSPEECH.

[2]  Tanvir Singh,et al.  Misdiagnosis of bipolar disorder. , 2006, Psychiatry (Edgmont (Pa. : Township)).

[3]  Roy H Perlis,et al.  Misdiagnosis of bipolar disorder. , 2005, The American journal of managed care.

[4]  D. Luckenbaugh,et al.  Perception of facial emotion in adults with bipolar or unipolar depression and controls. , 2010, Journal of psychiatric research.

[5]  Nicholas B. Allen,et al.  Multichannel Weighted Speech Classification System for Prediction of Major Depression in Adolescents , 2013, IEEE Transactions on Biomedical Engineering.

[6]  Chung-Hsien Wu,et al.  Survey on audiovisual emotion recognition: databases, features, and data fusion strategies , 2014, APSIPA Transactions on Signal and Information Processing.

[7]  Emily Mower Provost,et al.  Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  R.G. Shiavi,et al.  Distinguishing depression and suicidal risk in men using GMM based frequency contents of affective vocal tract response , 2008, 2008 International Conference on Control, Automation and Systems.

[9]  Chung-Hsien Wu,et al.  Speaking Effect Removal on Emotion Recognition From Facial Expressions Based on Eigenface Conversion , 2013, IEEE Transactions on Multimedia.

[10]  Saduoki Furui Unsupervised speaker adaptation based on hierarchical spectral clustering , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[12]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[13]  Chung-Hsien Wu,et al.  Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood Disorder , 2016, INTERSPEECH.

[14]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[15]  Chung-Hsien Wu,et al.  Two-Level Hierarchical Alignment for Semi-Coupled HMM-Based Audiovisual Emotion Recognition With Temporal Course , 2013, IEEE Transactions on Multimedia.

[16]  J. Calabrese,et al.  Development and validation of a screening instrument for bipolar spectrum disorder: the Mood Disorder Questionnaire. , 2000, The American journal of psychiatry.

[17]  Emily Mower Provost,et al.  Mood state prediction from speech of varying acoustic quality for individuals with bipolar disorder , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Newton Howard,et al.  Approach Towards a Natural Language Analysis for Diagnosing Mood Disorders and Comorbid Conditions , 2013, 2013 12th Mexican International Conference on Artificial Intelligence.

[19]  J. Gross,et al.  Emotion elicitation using films , 1995 .

[20]  Björn W. Schuller,et al.  Modeling gender information for emotion recognition using Denoising autoencoder , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Wootaek Lim,et al.  Speech emotion recognition using convolutional and Recurrent Neural Networks , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[22]  Wang Fei,et al.  Research on speech emotion recognition based on deep auto-encoder , 2016, 2016 IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems (CYBER).

[23]  Oscar Mayora-Ibarra,et al.  Smartphone-Based Recognition of States and State Changes in Bipolar Disorder Patients , 2015, IEEE Journal of Biomedical and Health Informatics.

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Chung-Hsien Wu,et al.  Data collection of elicited facial expressions and speech responses for mood disorder detection , 2015, 2015 International Conference on Orange Technologies (ICOT).

[26]  M. Hamilton A RATING SCALE FOR DEPRESSION , 1960, Journal of neurology, neurosurgery, and psychiatry.

[27]  T. Barnes A Rating Scale for Drug-Induced Akathisia , 1989, British Journal of Psychiatry.

[28]  F. S. Bersani,et al.  Facial expression in patients with bipolar disorder and schizophrenia in response to emotional stimuli: a partially shared cognitive and social deficit of the two disorders , 2013, Neuropsychiatric disease and treatment.

[29]  Chung-Hsien Wu,et al.  Code-Switching Event Detection by Using a Latent Language Space Model and the Delta-Bayesian Information Criterion , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  R. Paradiso,et al.  Monitoring physiological and behavioral signals to detect mood changes of bipolar patients , 2011, 2011 5th International Symposium on Medical Information and Communication Technology.

[31]  Dimitra Vergyri,et al.  Using Prosodic and Spectral Features in Detecting Depression in Elderly Males , 2011, INTERSPEECH.

[32]  Emily Mower Provost,et al.  Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective flow , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  R. C. Young,et al.  A Rating Scale for Mania: Reliability, Validity and Sensitivity , 1978, British Journal of Psychiatry.

[34]  Enzo Pasquale Scilingo,et al.  Electrodermal Activity in Bipolar Patients during Affective Elicitation , 2014, IEEE Journal of Biomedical and Health Informatics.

[35]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[36]  Nicholas B. Allen,et al.  Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  S. Leucht,et al.  Efficacy and extrapyramidal side-effects of the new antipsychotics olanzapine, quetiapine, risperidone, and sertindole compared to conventional antipsychotics and placebo. A meta-analysis of randomized controlled trials , 1999, Schizophrenia Research.

[39]  Enzo Pasquale Scilingo,et al.  A pattern recognition approach based on electrodermal response for pathological mood identification in bipolar disorders , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.

[41]  George Trigeorgis,et al.  End-to-End Multimodal Emotion Recognition Using Deep Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[42]  Chung-Hsien Wu,et al.  Error Weighted Semi-Coupled Hidden Markov Model for Audio-Visual Emotion Recognition , 2012, IEEE Transactions on Multimedia.

[43]  Mitchell D. Wilkes,et al.  Evaluation of Voice Acoustics as Predictors of Clinical Depression Scores. , 2017, Journal of voice : official journal of the Voice Foundation.

[44]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[45]  Xiaojie Yu,et al.  Combining feature selection and representation for speech emotion recognition , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[46]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[47]  Thomas F. Quatieri,et al.  On the relative importance of vocal source, system, and prosody in human depression , 2013, 2013 IEEE International Conference on Body Sensor Networks.