Bipolar Disorder Recognition with Histogram Features of Arousal and Body Gestures

This paper targets the Bipolar Disorder Challenge (BDC) task of Audio Visual Emotion Challenge (AVEC) 2018. Firstly, two novel features are proposed: 1) a histogram based arousal feature, in which the continuous arousal values are estimated from the audio cues by a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) model; 2) a Histogram of Displacement (HDR) based upper body posture feature, which characterizes the displacement and velocity of the key body points in the video segment. In addition, we propose a multi-stream bipolar disorder classification framework with Deep Neural Networks (DNNs) and a Random Forest, and adopt the ensemble learning strategy to alleviate the possible over-fitting problem due to the limited training data. Experimental results show that the proposed arousal feature and upper body posture feature are discriminative for different bipolar episodes, and our proposed framework achieves promising classification results on the development set, with the unweighted average recall (UAR) of 0.714, which is higher than the baseline result 0.635. On test set evaluation, our system obtains the same UAR (0.574) as the challenge baseline.

[1]  W. Hulstijn,et al.  Psychomotor Retardation in Elderly Untreated Depressed Patients , 2015, Front. Psychiatry.

[2]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[3]  Louis-Philippe Morency,et al.  Automatic nonverbal behavior indicators of depression and PTSD: the effect of gender , 2014, Journal on Multimodal User Interfaces.

[4]  Emily Mower Provost,et al.  Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[6]  Mohammad H. Mahoor,et al.  Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses , 2014, Image Vis. Comput..

[7]  Dongmei Jiang,et al.  Multimodal Measurement of Depression Using Deep Learning Models , 2017, AVEC@ACM Multimedia.

[8]  Paul E. Croarkin,et al.  Psychomotor retardation in depression: Biological underpinnings, measurement, and treatment , 2011, Progress in Neuro-Psychopharmacology and Biological Psychiatry.

[9]  Michel F. Valstar,et al.  Local Gabor Binary Patterns from Three Orthogonal Planes for Automatic Facial Expression Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[10]  D. Kupfer,et al.  Bipolar disorder: new perspectives in health care and prevention. , 2010, The Journal of clinical psychiatry.

[11]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[12]  J. S. Manning,et al.  Burden of illness in bipolar depression. , 2005, Primary care companion to the Journal of clinical psychiatry.

[13]  Abhinav Dhall,et al.  Depression Scale Recognition from Audio, Visual and Text Analysis , 2017, ArXiv.

[14]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Dongmei Jiang,et al.  Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[16]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[17]  Ediz Polat,et al.  A video-based eye pupil detection system for diagnosing bipolar disorder , 2013 .

[18]  Thomas F. Quatieri,et al.  Vocal and Facial Biomarkers of Depression based on Motor Incoordination and Timing , 2014, AVEC '14.

[19]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[20]  A. David Marshall,et al.  Depression Severity Prediction Based on Biomarkers of Psychomotor Retardation , 2017, AVEC@ACM Multimedia.

[21]  T. Deschamps,et al.  Cognitive component of psychomotor retardation in unipolar and bipolar depression: Is verbal fluency a relevant marker? Impact of repetitive transcranial stimulation , 2017, Psychiatry and clinical neurosciences.

[22]  Fabien Ringeval,et al.  AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data , 2015, AVEC@ACM Multimedia.

[23]  Fabien Ringeval,et al.  The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load , 2014, INTERSPEECH.

[24]  LinLin Shen,et al.  Human Behaviour-Based Automatic Depression Analysis Using Hand-Crafted Statistics and Deep Learned Spectral Features , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[25]  Fabien Ringeval,et al.  AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition , 2018, AVEC@MM.

[26]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[27]  P A Kyriacou,et al.  Colorimetric determinations of lithium levels in drop-volumes of human plasma for monitoring patients with bipolar mood disorder , 2016, EMBC.

[28]  Albert Ali Salah,et al.  The Turkish Audio-Visual Bipolar Disorder Corpus , 2018, 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia).

[29]  Roland Göcke,et al.  An Investigation of Emotional Speech in Depression Classification , 2016, INTERSPEECH.

[30]  Martha Sajatovic,et al.  Bipolar disorder: disease burden. , 2005, The American journal of managed care.

[31]  Constantine Kotropoulos,et al.  Fast sequential floating forward selection applied to emotional speech features estimated on DES and SUSAS data collections , 2006, 2006 14th European Signal Processing Conference.