Sequential fusion of facial appearance and dynamics for depression recognition

Abstract In mental health assessment, it is validated that nonverbal cues like facial expressions can be indicative of depressive disorders. Recently, the multimodal fusion of facial appearance and dynamics based on convolutional neural networks has demonstrated encouraging performance in depression analysis. However, correlation and complementarity between different visual modalities have not been well studied in prior methods. In this paper, we propose a sequential fusion method for facial depression recognition. For mining the correlated and complementary depression patterns in multimodal learning, a chained-fusion mechanism is introduced to jointly learn facial appearance and dynamics in a unified framework. We show that such sequential fusion can provide a probabilistic perspective of the model correlation and complementarity between two different data modalities for improved depression recognition. Results on a benchmark dataset show the superiority of our method against several state-of-the-art alternatives.

[1]  Abdenour Hadid,et al.  Combining Global and Local Convolutional 3D Networks for Detecting Depression from Facial Expressions , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[2]  Erik Cambria,et al.  Fuzzy commonsense reasoning for multimodal sentiment analysis , 2019, Pattern Recognit. Lett..

[3]  Yunhong Wang,et al.  DepAudioNet: An Efficient Deep Model for Audio based Depression Classification , 2016, AVEC@ACM Multimedia.

[4]  Thomas F. Quatieri,et al.  Vocal biomarkers of depression based on motor incoordination , 2013, AVEC@ACM Multimedia.

[5]  Hichem Sahli,et al.  Automatic Depression Analysis Using Dynamic Facial Appearance Descriptor and Dirichlet Process Fisher Encoding , 2019, IEEE Transactions on Multimedia.

[6]  Zi Huang,et al.  Suicidal ideation and mental disorder detection with attentive relation networks , 2020, Neural Computing and Applications.

[7]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Xiuzhuang Zhou,et al.  Facial Depression Recognition by Deep Joint Label Distribution and Metric Learning , 2022, IEEE Transactions on Affective Computing.

[9]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Azher Uddin,et al.  Depression Level Prediction Using Deep Spatiotemporal Features and Multilayer Bi-LTSM , 2022, IEEE Transactions on Affective Computing.

[11]  N. Heath,et al.  A Study of the Frequency of Self-Mutilation in a Community Sample of Adolescents , 2002 .

[12]  Jiwen Lu,et al.  MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Cordelia Schmid,et al.  Learning to Track for Spatio-Temporal Action Localization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Shashank Jaiswal,et al.  Spectral Representation of Behaviour Primitives for Depression Analysis , 2020 .

[16]  Wolfram Burgard,et al.  Multimodal deep learning for robust RGB-D object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[18]  Roland Göcke,et al.  Diagnosis of depression by behavioural signals: a multimodal approach , 2013, AVEC@ACM Multimedia.

[19]  Jane Yung-jen Hsu,et al.  Sentic blending: Scalable multimodal fusion for the continuous interpretation of semantics and sentics , 2013, 2013 IEEE Symposium on Computational Intelligence for Human-like Intelligence (CIHLI).

[20]  Michel F. Valstar,et al.  Local Gabor Binary Patterns from Three Orthogonal Planes for Automatic Facial Expression Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[21]  A. Beck,et al.  Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. , 1996, Journal of personality assessment.

[22]  J. Lépine,et al.  The increasing burden of depression , 2011, Neuropsychiatric disease and treatment.

[23]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[26]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[27]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[28]  Jitendra Malik,et al.  SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Thomas Brox,et al.  Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  L. Schneider,et al.  The Geriatric Depression Scale and the Beck Depression Inventory as screening instruments in an older adult outpatient population. , 1992 .

[31]  Michael Wagner,et al.  Multimodal Depression Detection: Fusion Analysis of Paralinguistic, Head Pose and Eye Gaze Behaviors , 2018, IEEE Transactions on Affective Computing.

[32]  Jiwen Lu,et al.  Learning Deep Binary Descriptor with Multi-Quantization , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Gang Wang,et al.  Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition , 2015, IEEE Transactions on Multimedia.

[34]  Xin Li,et al.  Automated Depression Diagnosis Based on Facial Dynamic Analysis and Sparse Coding , 2015, IEEE Transactions on Information Forensics and Security.

[35]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[36]  Richa Singh,et al.  Unconstrained Kinect video face database , 2017, Inf. Fusion.

[37]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[38]  Albert Ali Salah,et al.  Ensemble CCA for Continuous Emotion Prediction , 2014, AVEC '14.

[39]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[40]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[41]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[42]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[43]  Jeffrey F. Cohn,et al.  Detecting Depression Severity from Vocal Prosody , 2013, IEEE Transactions on Affective Computing.

[44]  Fan Zhang,et al.  Automatic Depression Scale Prediction using Facial Expression Dynamics and Regression , 2014, AVEC '14.

[45]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Guodong Guo,et al.  Visually Interpretable Representation Learning for Depression Recognition from Facial Images , 2020, IEEE Transactions on Affective Computing.

[47]  Guodong Guo,et al.  Automated Depression Diagnosis Based on Deep Networks to Encode Facial Appearance and Dynamics , 2018, IEEE Transactions on Affective Computing.

[48]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jiwen Lu,et al.  Sharable and Individual Multi-View Metric Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Jiwen Lu,et al.  Discriminative Multimetric Learning for Kinship Verification , 2014, IEEE Transactions on Information Forensics and Security.

[51]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Erik Cambria,et al.  Sentiment Analysis and Topic Recognition in Video Transcriptions , 2021, IEEE Intelligent Systems.

[53]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[54]  Hugo Jair Escalante,et al.  Fusing Affective Dimensions and Audio-Visual Features from Segmented Video for Depression Recognition: INAOE-BUAP's Participation at AVEC'14 Challenge , 2014, AVEC '14.

[55]  Zhaoxia Wang,et al.  A review of emotion sensing: categorization models and algorithms , 2020, Multimedia Tools and Applications.

[56]  Wei-Shi Zheng,et al.  GL-PAM RGB-D Gesture Recognition , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[57]  Zhongmin Wang,et al.  Automatic depression recognition using CNN with attention mechanism from videos , 2021, Neurocomputing.

[58]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[59]  Mohammad H. Mahoor,et al.  Nonverbal social withdrawal in depression: Evidence from manual and automatic analyses , 2014, Image Vis. Comput..

[60]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[61]  Hongying Meng,et al.  Descriptive temporal template features for visual motion recognition , 2009, Pattern Recognit. Lett..