Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks

This paper proposes a speech-based method for automatic depression classification. The system is based on ensemble learning for Convolutional Neural Networks (CNNs) and is evaluated using the data and the experimental protocol provided in the Depression Classification Sub-Challenge (DCC) at the 2016 Audio–Visual Emotion Challenge (AVEC-2016). In the pre-processing phase, speech files are represented as a sequence of log-spectrograms and randomly sampled to balance positive and negative samples. For the classification task itself, first, a more suitable architecture for this task, based on One-Dimensional Convolutional Neural Networks, is built. Secondly, several of these CNN-based models are trained with different initializations and then the corresponding individual predictions are fused by using an Ensemble Averaging algorithm and combined per speaker to get an appropriate final decision. The proposed ensemble system achieves satisfactory results on the DCC at the AVEC-2016 in comparison with a reference system based on Support Vector Machines and hand-crafted features, with a CNN+LSTM-based system called DepAudionet, and with the case of a single CNN-based classifier.

[1]  Meysam Asgari,et al.  Inferring clinical depression from speech and spoken utterances , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[2]  Maja Pantic,et al.  A Dynamic Appearance Descriptor Approach to Facial Actions Temporal Modeling , 2014, IEEE Transactions on Cybernetics.

[3]  M. Lech,et al.  Prediction of clinical depression in adolescents using facial image analaysis , 2011, WIAMIS 2011.

[4]  Rubén San-Segundo-Hernández,et al.  Random forest-based prediction of parkinson's disease progression using acoustic, ASR and intelligibility features , 2015, INTERSPEECH.

[5]  Yunhong Wang,et al.  DepAudioNet: An Efficient Deep Model for Audio based Depression Classification , 2016, AVEC@ACM Multimedia.

[6]  Nicholas Cummins,et al.  Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning. , 2018, Methods.

[7]  J. Bardram,et al.  Voice analysis as an objective state marker in bipolar disorder , 2016, Translational psychiatry.

[8]  Theodoros Giannakopoulos pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis , 2015, PloS one.

[9]  Franz Pernkopf,et al.  Acoustic scene classification using a convolutional neural network ensemble and nearest neighbor filters , 2018, DCASE.

[10]  Hermann Ney,et al.  Convolutional neural networks for acoustic modeling of raw time signal in LVCSR , 2015, INTERSPEECH.

[11]  Fabien Ringeval,et al.  AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition , 2018, AVEC@MM.

[12]  Rachel Sharp,et al.  The Hamilton Rating Scale for Depression. , 2015, Occupational medicine.

[13]  Erik Cambria,et al.  Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis , 2017, Neurocomputing.

[14]  Björn W. Schuller,et al.  AVEC 2013: the continuous audio/visual emotion and depression recognition challenge , 2013, AVEC@ACM Multimedia.

[15]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Ana Madureira,et al.  Automatic detection of Parkinson's disease based on acoustic analysis of speech , 2019, Eng. Appl. Artif. Intell..

[17]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[18]  Chunjun Zheng,et al.  An Ensemble Model for Multi-Level Speech Emotion Recognition , 2019, Applied Sciences.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Björn W. Schuller,et al.  AVEC 2011-The First International Audio/Visual Emotion Challenge , 2011, ACII.

[21]  Dongmei Jiang,et al.  Decision Tree Based Depression Classification from Audio Video and Language Information , 2016, AVEC@ACM Multimedia.

[22]  Juan Manuel Montero-Martínez,et al.  A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech , 2019, INTERSPEECH.

[23]  Karol J. Piczak Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[24]  Li Deng,et al.  A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Thomas F. Quatieri,et al.  A review of depression and suicide risk assessment using speech analysis , 2015, Speech Commun..

[26]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[27]  Richard A. Berk,et al.  An Introduction to Ensemble Methods for Data Analysis , 2004 .

[28]  V. Leirer,et al.  Development and validation of a geriatric depression screening scale: a preliminary report. , 1982, Journal of psychiatric research.

[29]  Fabien Ringeval,et al.  Summary for AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, ACM Multimedia.

[30]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[31]  Jiri Mekyska,et al.  Advances on Automatic Speech Analysis for Early Detection of Alzheimer Disease: A Non-linear Multi-task Approach. , 2018, Current Alzheimer research.

[32]  Joon-Hyuk Chang,et al.  Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection , 2016, Comput. Speech Lang..

[33]  Juan Manuel Montero-Martínez,et al.  External Attention LSTM Models for Cognitive Load Classification from Speech , 2019, SLSP.

[34]  Patrick van der Smagt,et al.  Introduction to neural networks , 1995, The Lancet.

[35]  Hasan Demirel,et al.  3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms , 2019, Entropy.

[36]  Roland Göcke,et al.  Diagnosis of depression by behavioural signals: a multimodal approach , 2013, AVEC@ACM Multimedia.

[37]  T. Strine,et al.  The PHQ-8 as a measure of current depression in the general population. , 2009, Journal of affective disorders.

[38]  Shih-Hau Fang,et al.  Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach. , 2019, Journal of voice : official journal of the Voice Foundation.

[39]  Fernando De la Torre,et al.  Detecting depression from facial actions and vocal prosody , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[40]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[42]  Björn W. Schuller,et al.  AVEC 2014: 3D Dimensional Affect and Depression Recognition Challenge , 2014, AVEC '14.

[43]  Sascha Meudt,et al.  Fusion of Audio-visual Features using Hierarchical Classifier Systems for the Recognition of Affective States and the State of Depression , 2014, ICPRAM.

[44]  David DeVault,et al.  The Distress Analysis Interview Corpus of human and computer interviews , 2014, LREC.

[45]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[46]  Ossama S. Alshabrawy,et al.  Deep learning-based automated speech detection as a marker of social functioning in late-life depression , 2020, Psychological Medicine.

[47]  Gábor Gosztolya,et al.  Identifying Mild Cognitive Impairment and mild Alzheimer's disease based on spontaneous speech using ASR and linguistic features , 2019, Comput. Speech Lang..

[48]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[49]  A. Beck,et al.  Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation , 1988 .

[50]  Panayiotis G. Georgiou,et al.  Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features , 2016, AVEC@ACM Multimedia.

[51]  Myung Jong Kim,et al.  Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks , 2018, INTERSPEECH.

[52]  Xin Li,et al.  Automated Depression Diagnosis Based on Facial Dynamic Analysis and Sparse Coding , 2015, IEEE Transactions on Information Forensics and Security.

[53]  Satrajit S. Ghosh,et al.  Automated assessment of psychiatric disorders using speech: A systematic review , 2019, Laryngoscope investigative otolaryngology.

[54]  Thomas F. Quatieri,et al.  Vocal-Source Biomarkers for Depression: A Link to Psychomotor Activity , 2012, INTERSPEECH.

[55]  David Dagan Feng,et al.  An Ensemble of Fine-Tuned Convolutional Neural Networks for Medical Image Classification , 2017, IEEE Journal of Biomedical and Health Informatics.

[56]  Joon-Hyuk Chang,et al.  Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition , 2016, Digit. Signal Process..

[57]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[58]  S. Bachmann Epidemiology of Suicide and the Psychiatric Perspective , 2018, International journal of environmental research and public health.

[59]  J. Darby,et al.  Speech and voice parameters of depression: a pilot study. , 1984, Journal of communication disorders.

[60]  Geoffrey Zweig,et al.  Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  Robert T. Schultz,et al.  Automatic Detection of Autism Spectrum Disorder in Children Using Acoustic and Text Features from Brief Natural Conversations , 2019, INTERSPEECH.