Clinical Depression and Affect Recognition with EmoAudioNet

Automatic analysis of emotions and affects from speech is an inherently challenging problem with a broad range of applications in Human-Computer Interaction (HCI), health informatics, assistive technologies and multimedia retrieval. Understanding human's specific and basic emotions and reacting accordingly can improve HCI. Besides, giving machines skills to understand human's emotions when interacting with other humans can help humans with a socio-affective intelligence. In this paper, we present a deep Neural Network-based architecture called EmoAudioNet which studies the time-frequency representation of the audio signal and the visual representation of its spectrum of frequencies. Two applications are performed using EmoAudioNet : automatic clinical depression recognition and continuous dimensional emotion recognition from speech. The extensive experiments showed that the proposed approach significantly outperforms the state-of-art approaches on RECOLA and DAIC-WOZ databases. The competitive results call for applying EmoAudioNet on others affects and emotions recognition from speech applications.

[1]  Zhenyu Liu,et al.  Detection of depression in speech , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[2]  Fabien Ringeval,et al.  AVEC 2015: The 5th International Audio/Visual Emotion Challenge and Workshop , 2015, ACM Multimedia.

[3]  P. Sullivan,et al.  Genetic epidemiology of major depression: review and meta-analysis. , 2000, The American journal of psychiatry.

[4]  Dongmei Jiang,et al.  Multimodal Affective Dimension Prediction Using Deep Bidirectional Long Short-Term Memory Recurrent Neural Networks , 2015, AVEC@ACM Multimedia.

[5]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[6]  Jean-Philippe Thiran,et al.  Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data , 2015, Pattern Recognit. Lett..

[7]  James R. Glass,et al.  Detecting Depression with Audio/Text Sequence Modeling of Interviews , 2018, INTERSPEECH.

[8]  Ashutosh Kumar Singh,et al.  Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015 , 2016, Lancet.

[9]  Alice Othmani,et al.  MFCC-based Recurrent Neural Network for Automatic Clinical Depression Recognition and Assessment from Speech , 2019, ArXiv.

[10]  Heng Wang,et al.  Depression recognition based on dynamic facial and vocal expression features using partial least square regression , 2013, AVEC@ACM Multimedia.

[11]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[12]  Ya Li,et al.  Multi task sequence learning for depression scale prediction from video , 2015, 2015 International Conference on Affective Computing and Intelligent Interaction (ACII).

[13]  Yunhong Wang,et al.  DepAudioNet: An Efficient Deep Model for Audio based Depression Classification , 2016, AVEC@ACM Multimedia.

[14]  Royal Jain,et al.  Improving performance and inference on audio classification tasks using capsule networks , 2019, ArXiv.

[15]  Roland Göcke,et al.  An Investigation of Depressed Speech Detection: Features and Normalization , 2011, INTERSPEECH.

[16]  Eivind Kvedalen Signal processing using the Teager Energy Operator and other nonlinear operators , 2003 .

[17]  Fabien Ringeval,et al.  AV+EC 2015: The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data , 2015, AVEC@ACM Multimedia.

[18]  Fabien Ringeval,et al.  AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition , 2018, AVEC@MM.

[19]  Philip J. B. Jackson,et al.  Speaker-dependent audio-visual emotion recognition , 2009, AVSP.

[20]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[21]  David DeVault,et al.  The Distress Analysis Interview Corpus of human and computer interviews , 2014, LREC.

[22]  Jean-Jacques E. Slotine,et al.  Audio classification from time-frequency texture , 2008, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Fabien Ringeval,et al.  Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[24]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Ye Yin,et al.  Deep Transformation Learning for Depression Diagnosis from Facial Images , 2017, CCBR.

[26]  George Trigeorgis,et al.  Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Dongmei Jiang,et al.  Multimodal Measurement of Depression Using Deep Learning Models , 2017, AVEC@ACM Multimedia.

[28]  Nicholas B. Allen,et al.  Detection of Clinical Depression in Adolescents’ Speech During Family Interactions , 2011, IEEE Transactions on Biomedical Engineering.

[29]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[30]  Björn W. Schuller,et al.  End-to-End Speech Emotion Recognition Using Deep Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  George M. Slavich Deconstructing Depression: A Diathesis-Stress Perspective , 2004 .

[32]  Chee Siang Ang,et al.  Data in the wild: some reflections , 2013, INTR.

[33]  Fabien Ringeval,et al.  AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge , 2017, AVEC@ACM Multimedia.

[34]  George Trigeorgis,et al.  End-to-End Multimodal Emotion Recognition Using Deep Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[35]  Kenneth Ma,et al.  Attachment theory in adult psychiatry. Part 1: Conceptualisations, measurement and clinical research findings , 2006 .

[36]  Abhinav Dhall,et al.  Depression Scale Recognition from Audio, Visual and Text Analysis , 2017, ArXiv.

[37]  John A. Stankovic,et al.  A Weakly Supervised Learning Framework for Detecting Social Anxiety and Depression , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[38]  Shrikanth S. Narayanan,et al.  An Affect Prediction Approach Through Depression Severity Parameter Incorporation in Neural Networks , 2017, INTERSPEECH.

[39]  Gang Wang,et al.  Detecting Depression Using an Ensemble Logistic Regression Model Based on Multiple Speech Features , 2018, Comput. Math. Methods Medicine.

[40]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[41]  Fabien Ringeval,et al.  SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  B. F. Jeronimus,et al.  Neuroticism's prospective association with mental disorders halves after adjustment for baseline symptoms and psychiatric history, but the adjusted association hardly decays with time: a meta-analysis on 59 longitudinal/prospective studies with 443 313 participants , 2016, Psychological Medicine.

[43]  Guoying Zhao,et al.  Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond , 2018, International Journal of Computer Vision.

[44]  Jian Huang,et al.  Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition , 2018, ArXiv.

[45]  Fabien Ringeval,et al.  Summary for AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, ACM Multimedia.

[46]  Fan Zhang,et al.  Artificial Intelligent System for Automatic Depression Level Analysis Through Visual and Vocal Expressions , 2018, IEEE Transactions on Cognitive and Developmental Systems.

[47]  Tamás D. Gedeon,et al.  A comparative study of different classifiers for detecting depression from spontaneous speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[49]  Dongmei Jiang,et al.  Hybrid Depression Classification and Estimation from Audio Video and Text Information , 2017, AVEC@ACM Multimedia.

[50]  Carmen García-Mateo,et al.  A study of acoustic features for depression detection , 2014, 2nd International Workshop on Biometrics and Forensics.