Attention-based convolutional neural network and long short-term memory for short-term detection of mood disorders based on elicited speech responses

Abstract Mood disorders, including unipolar depression (UD) and bipolar disorder (BD), have become some of the commonest mental health disorders. The absence of diagnostic markers of BD can cause misdiagnosis of the disorder as UD on initial presentation. Short-term detection, which could be used in early detection and intervention, is desirable. This study proposed an approach for short-term detection of mood disorders based on elicited speech responses. Speech responses of participants were obtained through interviews by a clinician after participants viewed six emotion-eliciting videos. A domain adaptation method based on a hierarchical spectral clustering algorithm was proposed to adapt a labeled emotion database into a collected unlabeled mood database for alleviating the data bias problem in an emotion space. For modeling the local variation of emotions in each response, a convolutional neural network (CNN) with an attention mechanism was used to generate an emotion profile (EP) of each elicited speech response. Finally, long short-term memory (LSTM) was employed to characterize the temporal evolution of EPs of all six speech responses. Moreover, an attention model was applied to the LSTM network for highlighting pertinent speech responses to improve detection performance instead of treating all responses equally. For evaluation, this study elicited emotional speech data from 15 people with BD, 15 people with UD, and 15 healthy controls. Leave-one-group-out cross-validation was employed for the compiled database and proposed method. CNN- and LSTM-based attention models improved the mood disorder detection accuracy of the proposed method by approximately 11%. Furthermore, the proposed method achieved an overall detection accuracy of 75.56%, outperforming support-vector-machine- (62.22%) and CNN-based (66.67%) methods.

[1]  J. Mendlewicz,et al.  Speech Pause Time as a Method for the Evaluation of Psychomotor Retardation in Depressive Illness , 1985, British Journal of Psychiatry.

[2]  Sunil Kumar Kopparapu,et al.  Spontaneous speech emotion recognition using prior knowledge , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[3]  Benjamin Blankertz,et al.  Presenting a Spatial-Geometric EEG Feature to Classify BMD and Schizophrenic Patients , 2016 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  H. Sackeim,et al.  Psychomotor symptoms of depression. , 1997, The American journal of psychiatry.

[6]  Nicholas B. Allen,et al.  Detection of depression in adolescents based on statistical modeling of emotional influences in parent-adolescent conversations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Lisa Cipolotti,et al.  Bipolar I and bipolar II disorder: cognition and emotion processing , 2006, Psychological Medicine.

[8]  Nicholas B. Allen,et al.  Multichannel Weighted Speech Classification System for Prediction of Major Depression in Adolescents , 2013, IEEE Transactions on Biomedical Engineering.

[9]  Li Yao,et al.  Independent component analysis of the resting-state brain functional MRI study in adults with bipolar depression , 2012, 2012 ICME International Conference on Complex Medical Engineering (CME).

[10]  Jake K. Aggarwal,et al.  Spontaneous facial expression recognition: A robust metric learning approach , 2014, Pattern Recognit..

[11]  Enzo Pasquale Scilingo,et al.  Electrodermal Activity in Bipolar Patients during Affective Elicitation , 2014, IEEE Journal of Biomedical and Health Informatics.

[12]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[13]  F. S. Bersani,et al.  Facial expression in patients with bipolar disorder and schizophrenia in response to emotional stimuli: a partially shared cognitive and social deficit of the two disorders , 2013, Neuropsychiatric disease and treatment.

[14]  Enzo Pasquale Scilingo,et al.  Complexity modulation in heart rate variability during pathological mental states of bipolar disorders , 2014, 2014 8th Conference of the European Study Group on Cardiovascular Oscillations (ESGCO).

[15]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[16]  Fakhri Karray,et al.  Multiview Supervised Dictionary Learning in Speech Emotion Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Dipti Patil,et al.  Cumulative video analysis based smart framework for detection of depression disorders , 2015, 2015 International Conference on Pervasive Computing (ICPC).

[18]  Lianhong Cai,et al.  Speech emotion classification with the combination of statistic features and temporal features , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[19]  Enzo Pasquale Scilingo,et al.  Features of vocal frequency contour and speech rhythm in bipolar disorder , 2017, Biomed. Signal Process. Control..

[20]  Tamás D. Gedeon,et al.  A comparative study of different classifiers for detecting depression from spontaneous speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Enzo Pasquale Scilingo,et al.  Wearable Monitoring for Mood Recognition in Bipolar Disorder Based on History-Dependent Long-Term Heart Rate Variability Analysis , 2014, IEEE Journal of Biomedical and Health Informatics.

[22]  Enzo Pasquale Scilingo,et al.  A pattern recognition approach based on electrodermal response for pathological mood identification in bipolar disorders , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Enzo Pasquale Scilingo,et al.  Predicting Mood Changes in Bipolar Disorder through Heartbeat Nonlinear Dynamics: a Preliminary Study , 2015, CinC.

[24]  Yongzhao Zhan,et al.  Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks , 2014, IEEE Transactions on Multimedia.

[25]  R. Menaka,et al.  EEG signal and video analysis based depression indication , 2014, 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies.

[26]  Margaret Lech,et al.  Video-based detection of the clinical depression in adolescents , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[27]  Oscar Mayora-Ibarra,et al.  Smartphone-Based Recognition of States and State Changes in Bipolar Disorder Patients , 2015, IEEE Journal of Biomedical and Health Informatics.

[28]  Emily Mower Provost,et al.  Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  E. Scilingo,et al.  Mood states modulate complexity in heartbeat dynamics: A multiscale entropy analysis , 2014 .

[30]  Saduoki Furui Unsupervised speaker adaptation based on hierarchical spectral clustering , 1989, IEEE Trans. Acoust. Speech Signal Process..

[31]  Ragini Verma,et al.  Class-level spectral features for emotion recognition , 2010, Speech Commun..

[32]  Yen-Ting Chen,et al.  Physiological signal analysis for patients with depression , 2011, 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI).

[33]  Agnes Grünerbl,et al.  Assessing Bipolar Episodes Using Speech Cues Derived from Phone Calls , 2014, MindCare.

[34]  Dong Yu,et al.  Exploring convolutional neural network structures and optimization techniques for speech recognition , 2013, INTERSPEECH.

[35]  Eduardo Coutinho,et al.  Cooperative Learning and its Application to Emotion Recognition from Speech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  Yong Man Ro,et al.  Collaborative expression representation using peak expression and intra class variation face images for practical subject-independent emotion recognition in videos , 2016, Pattern Recognit..

[37]  Ramesh Raskar,et al.  Illumination invariants in deep video expression recognition , 2018, Pattern Recognition.

[38]  Ahmed Bouridane,et al.  Emotion recognition from scrambled facial images via many graph embedding , 2017, Pattern Recognit..

[39]  Nicu Sebe,et al.  Emotion Recognition Based on Joint Visual and Audio Cues , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[40]  Å. Nilsonne Speech characteristics as indicators of depressive illness , 1988, Acta psychiatrica Scandinavica.

[41]  R.G. Shiavi,et al.  Distinguishing depression and suicidal risk in men using GMM based frequency contents of affective vocal tract response , 2008, 2008 International Conference on Control, Automation and Systems.

[42]  R Jouvent,et al.  Speech pause time and the retardation rating scale for depression (ERD). Towards a reciprocal validation. , 1984, Journal of affective disorders.

[43]  Friedhelm Schwenker,et al.  Emotion recognition from speech signals via a probabilistic echo-state network , 2015, Pattern Recognit. Lett..

[44]  Bin Hu,et al.  User-centered depression prevention: An EEG approach to pervasive healthcare , 2011, 2011 5th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops.

[45]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Enzo Pasquale Scilingo,et al.  Maximal-radius multiscale entropy of cardiovascular variability: A promising biomarker of pathological mood states in bipolar disorders , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[47]  Marie Tahon,et al.  Towards a Small Set of Robust Acoustic Features for Emotion Recognition: Challenges , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[48]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[49]  K. S. Venkatesh,et al.  Emotion recognition from geometric facial features using self-organizing map , 2014, Pattern Recognit..

[50]  Jeesun Kim,et al.  Emotional speech processing deficits in bipolar disorder: The role of mismatch negativity and P3a. , 2018, Journal of affective disorders.

[51]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[52]  C. Bradshaw,et al.  Elongation of Pause-Time in Speech: A Simple, Objective Measure of Motor Retardation in Depression , 1976, British Journal of Psychiatry.

[53]  Rui Xia,et al.  Sentence level emotion recognition based on decisions from subsentence segments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Chung-Hsien Wu,et al.  Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood Disorder , 2016, INTERSPEECH.

[55]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[56]  Chung-Hsien Wu,et al.  Data collection of elicited facial expressions and speech responses for mood disorder detection , 2015, 2015 International Conference on Orange Technologies (ICOT).