Multimodal assistive technologies for depression diagnosis and monitoring

Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework’s effectiveness in depression analysis.

[1]  Michael Wagner,et al.  Evaluating AAM fitting methods for facial expression recognition , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[2]  Roland Göcke,et al.  Learning AAM fitting through simulation , 2009, Pattern Recognition.

[3]  Zhihong Zeng,et al.  A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions , 2009, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Klaus R. Scherer,et al.  Vocal indicators of mood change in depression , 1996 .

[5]  Tamás D. Gedeon,et al.  Emotion recognition using PHOG and LPQ features , 2011, Face and Gesture 2011.

[6]  Loïc Kessous,et al.  The relevance of feature type for the automatic classification of emotional user states: low level descriptors and functionals , 2007, INTERSPEECH.

[7]  Michael Wagner,et al.  From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech , 2012, FLAIRS.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  Anton Batliner,et al.  Speaker Characteristics and Emotion Classification , 2007, Speaker Classification.

[10]  Johannes Wagner,et al.  A systematic discussion of fusion techniques for multi-modal affect recognition tasks , 2011, ICMI '11.

[11]  J. Mundt,et al.  Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology , 2007, Journal of Neurolinguistics.

[12]  Christian Müller Speaker Classification II, Selected Projects , 2007, Speaker Classification.

[13]  Erica Klarreich,et al.  Hello, my name is… , 2014, CACM.

[14]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[15]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[16]  Marian Stewart Bartlett,et al.  Exploring Bag of Words Architectures in the Facial Expression Domain , 2012, ECCV Workshops.

[17]  D. Mitchell Wilkes,et al.  Analysis of fundamental frequency for near term suicidal risk assessment , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[18]  Roland Göcke,et al.  Facial Expression Based Automatic Album Creation , 2010, ICONIP.

[19]  W. Cullen,et al.  Research confuses me: what is the difference between case-control and cohort studies in quantitative research? , 2013, Irish medical journal.

[20]  Timothy F. Cootes,et al.  Interpreting face images using active appearance models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[21]  J. Cohn,et al.  Deciphering the Enigmatic Face , 2005, Psychological science.

[22]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[23]  Roland Göcke,et al.  Neural-net classification for spatio-temporal descriptor based depression analysis , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[24]  J. Peifer,et al.  Comparing objective feature statistics of speech for classifying clinical depression , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[25]  Å. Nilsonne Speech characteristics as indicators of depressive illness , 1988, Acta psychiatrica Scandinavica.

[26]  Rosalind W. Picard Affective computing: (526112012-054) , 1997 .

[27]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[28]  Elliot Moore,et al.  Critical Analysis of the Impact of Glottal Features in the Classification of Clinical Depression in Speech , 2008, IEEE Transactions on Biomedical Engineering.

[29]  H H Stassen,et al.  Speaking behavior and voice sound characteristics in depressive patients during recovery. , 1993, Journal of psychiatric research.

[30]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[31]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Christina Sobin,et al.  Melancholia: A disorder of movement and mood , 1997 .

[34]  Olga V. Demler,et al.  The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). , 2003, JAMA.

[35]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[36]  H. Ellgring Nonverbal communication in depression , 1989 .

[37]  Roland Göcke,et al.  An approach for automatically measuring facial activity in depressed subjects , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[38]  F. McNair Understanding depression. , 1981, Canadian family physician Medecin de famille canadien.

[39]  Changbo Hu,et al.  AAM derived face representations for robust facial action recognition , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[40]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[42]  Simon Lucey,et al.  Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[43]  Roland Göcke,et al.  Finding Happiest Moments in a Social Context , 2012, ACCV.

[44]  Allen Raskin Melancholia: A Disorder of Movement and Mood. , 1997 .

[45]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Roland Göcke,et al.  Can body expressions contribute to automatic depression analysis? , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).