Cell-Coupled Long Short-Term Memory With $L$ -Skip Fusion Mechanism for Mood Disorder Detection Through Elicited Audiovisual Features

In early stages, patients with bipolar disorder are often diagnosed as having unipolar depression in mood disorder diagnosis. Because the long-term monitoring is limited by the delayed detection of mood disorder, an accurate and one-time diagnosis is desirable to avoid delay in appropriate treatment due to misdiagnosis. In this paper, an elicitation-based approach is proposed for realizing a one-time diagnosis by using responses elicited from patients by having them watch six emotion-eliciting videos. After watching each video clip, the conversations, including patient facial expressions and speech responses, between the participant and the clinician conducting the interview were recorded. Next, the hierarchical spectral clustering algorithm was employed to adapt the facial expression and speech response features by using the extended Cohn–Kanade and eNTERFACE databases. A denoizing autoencoder was further applied to extract the bottleneck features of the adapted data. Then, the facial and speech bottleneck features were input into support vector machines to obtain speech emotion profiles (EPs) and the modulation spectrum (MS) of the facial action unit sequence for each elicited response. Finally, a cell-coupled long short-term memory (LSTM) network with an $L$ -skip fusion mechanism was proposed to model the temporal information of all elicited responses and to loosely fuse the EPs and the MS for conducting mood disorder detection. The experimental results revealed that the cell-coupled LSTM with the $L$ -skip fusion mechanism has promising advantages and efficacy for mood disorder detection.

[1]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[2]  M. McInnis,et al.  Modality-specific alterations in the perception of emotional stimuli in Bipolar Disorder compared to Healthy Controls and Major Depressive Disorder , 2012, Cortex.

[3]  F. S. Bersani,et al.  Facial expression in patients with bipolar disorder and schizophrenia in response to emotional stimuli: a partially shared cognitive and social deficit of the two disorders , 2013, Neuropsychiatric disease and treatment.

[4]  Mohamad Ivan Fanany,et al.  Stacked Denoising Autoencoder for feature representation learning in pose-based action recognition , 2014, 2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE).

[5]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[6]  Oscar Mayora-Ibarra,et al.  Smartphone-Based Recognition of States and State Changes in Bipolar Disorder Patients , 2015, IEEE Journal of Biomedical and Health Informatics.

[7]  R. Paradiso,et al.  Monitoring physiological and behavioral signals to detect mood changes of bipolar patients , 2011, 2011 5th International Symposium on Medical Information and Communication Technology.

[8]  Dimitra Vergyri,et al.  Using Prosodic and Spectral Features in Detecting Depression in Elderly Males , 2011, INTERSPEECH.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  M. Hamilton A RATING SCALE FOR DEPRESSION , 1960, Journal of neurology, neurosurgery, and psychiatry.

[11]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Dimitrios Ververidis,et al.  A State of the Art Review on Emotional Speech Databases , 2003 .

[13]  Chung-Hsien Wu,et al.  Detection of mood disorder using speech emotion profiles and LSTM , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[14]  Nicole Beringer,et al.  Human language acquisition methods in a machine learning task , 2004, INTERSPEECH.

[15]  Chung-Hsien Wu,et al.  Survey on audiovisual emotion recognition: databases, features, and data fusion strategies , 2014, APSIPA Transactions on Signal and Information Processing.

[16]  Chung-Hsien Wu,et al.  Coupled HMM-based multimodal fusion for mood disorder detection through elicited audio–visual signals , 2016, J. Ambient Intell. Humaniz. Comput..

[17]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[18]  Nicholas B. Allen,et al.  Multichannel Weighted Speech Classification System for Prediction of Major Depression in Adolescents , 2013, IEEE Transactions on Biomedical Engineering.

[19]  Ioannis Pitas,et al.  The eNTERFACE’05 Audio-Visual Emotion Database , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[20]  Janet B W Williams Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[21]  Nicholas B. Allen,et al.  Influence of acoustic low-level descriptors in the detection of clinical depression in adolescents , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[23]  Chung-Hsien Wu,et al.  Detection of mood disorder using modulation spectrum of facial action unit profiles , 2016, 2016 International Conference on Orange Technologies (ICOT).

[24]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[25]  Nicholas B. Allen,et al.  Detection of depression in adolescents based on statistical modeling of emotional influences in parent-adolescent conversations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  R. Menaka,et al.  EEG signal and video analysis based depression indication , 2014, 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies.

[29]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[30]  Maja Pantic,et al.  Gauss-Newton Deformable Part Models for Face Alignment In-the-Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Sidney K. D'Mello,et al.  A Review and Meta-Analysis of Multimodal Affect Detection Systems , 2015, ACM Comput. Surv..

[33]  Enzo Pasquale Scilingo,et al.  A pattern recognition approach based on electrodermal response for pathological mood identification in bipolar disorders , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Jeffrey F. Cohn,et al.  Dynamic Multimodal Measurement of Depression Severity Using Deep Autoencoding , 2018, IEEE Journal of Biomedical and Health Informatics.

[35]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[36]  T. Barnes A Rating Scale for Drug-Induced Akathisia , 1989, British Journal of Psychiatry.

[37]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Emily Mower Provost,et al.  Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Saduoki Furui Unsupervised speaker adaptation based on hierarchical spectral clustering , 1989, IEEE Trans. Acoust. Speech Signal Process..

[40]  Enzo Pasquale Scilingo,et al.  Electrodermal Activity in Bipolar Patients during Affective Elicitation , 2014, IEEE Journal of Biomedical and Health Informatics.

[41]  Aggelos K. Katsaggelos,et al.  Audiovisual Fusion: Challenges and New Approaches , 2015, Proceedings of the IEEE.

[42]  R. Moreno,et al.  Facial emotion recognition and its correlation with executive functions in bipolar I patients and healthy controls. , 2014, Journal of affective disorders.