Speech-Based Stress Classification based on Modulation Spectral Features and Convolutional Neural Networks

Interest in stress recognition has notably increased over the past few years. In this work, we focus on recognizing stress from speech. We propose the use of modulation spectral features as input to a convolutional neural network (CNN) for classifying stress. As benchmark, the OpenSMILE features used in the INTERSPEECH 2010 Paralinguistic Challenge is adopted and evaluated with a support vector machine (SVM) and a deep neural network (DNN) based backends. Experiments are performed with the well-known Speech Under Simulated and Actual Stress (SUSAS) database. Performances are investigated considering 2-class, 4-class and 9-class classification problems. Results show that the proposed approach outperforms the benchmark on a challenging 9-class classification task with accuracy as high as 70% representing gains of roughly 18% over the benchmark.

[1]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[2]  Tiago H. Falk,et al.  Feature Pooling of Modulation Spectrum Features for Improved Speech Emotion Recognition in the Wild , 2018, IEEE Transactions on Affective Computing.

[3]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[4]  Douglas D. O'Shaughnessy,et al.  Speech emotion recognition on mobile devices based on modulation spectral feature pooling and deep neural networks , 2017, 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[5]  S. Lund,et al.  The association between different traumatic life events and suicidality , 2018, European journal of psychotraumatology.

[6]  M. Kumari,et al.  Psychophysiological biomarkers of workplace stressors , 2010, Neuroscience & Biobehavioral Reviews.

[7]  Nachshon Meiran,et al.  Enhanced performance on executive functions associated with examination stress: Evidence from task-switching and Stroop paradigms , 2006 .

[8]  George R. Doddington,et al.  Recognition of speech under stress and in noise , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[10]  Ning An,et al.  Speech Emotion Recognition Using Fourier Parameters , 2015, IEEE Transactions on Affective Computing.

[11]  Maxine Weinstein,et al.  The interactive effect of change in perceived stress and trait anxiety on vagal recovery from cognitive challenge. , 2011, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[12]  John K-J Li,et al.  A Noninvasive Parametric Evaluation of Stress Effects on Global Cardiovascular Function , 2007, Cardiovascular engineering.

[13]  George Trigeorgis,et al.  End-to-End Multimodal Emotion Recognition Using Deep Neural Networks , 2017, IEEE Journal of Selected Topics in Signal Processing.

[14]  Tamás D. Gedeon,et al.  Objective measures, sensors and computational techniques for stress recognition and classification: A survey , 2012, Comput. Methods Programs Biomed..

[15]  K. YogeshC.,et al.  Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech , 2017, Appl. Soft Comput..

[16]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[17]  S. Hafeez The Impact of Job Stress on Performance of Employees: A Study of Social Security Hospital of District Okara & Sahiwal , 2018 .

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[20]  K. Stevens,et al.  Emotions and speech: some acoustical correlates. , 1972, The Journal of the Acoustical Society of America.

[21]  Fabien Ringeval,et al.  AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge , 2016, AVEC@ACM Multimedia.

[22]  Tiago H. Falk,et al.  Automatic speech emotion recognition using modulation spectral features , 2011, Speech Commun..