Emotion Recognition from Speech: Stress Experiment

The goal of this work is to introduce an architecture to automatically detect the amount of stress in the speech signal close to real time. For this an experimental setup to record speech rich in vocabulary and containing different stress levels is presented. Additionally, an experiment explaining the labeling process with a thorough analysis of the labeled data is presented. Fifteen subjects were asked to play an air controller simulation that gradually induced more stress by becoming more difficult to control. During this game the subjects were asked to answer questions, which were then labeled by a different set of subjects in order to receive a subjective target value for each of the answers. A recurrent neural network was used to measure the amount of stress contained in the utterances after training. The neural network estimated the amount of stress at a frequency of 25 Hz and outperformed the human baseline.

[1]  Björn W. Schuller,et al.  On the Necessity and Feasibility of Detecting a Driver's Emotional State While Driving , 2007, ACII.

[2]  John H. L. Hansen,et al.  Getting started with SUSAS: a speech under simulated and actual stress database , 1997, EUROSPEECH.

[3]  Herbert Jaeger,et al.  A tutorial on training recurrent neural networks , covering BPPT , RTRL , EKF and the " echo state network " approach - Semantic Scholar , 2005 .

[4]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[5]  Misha Pavel,et al.  On the relative importance of various components of the modulation spectrum for automatic speech recognition , 1999, Speech Commun..

[6]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  H. Hermansky,et al.  The modulation spectrum in the automatic recognition of speech , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[8]  G. Palm,et al.  Classifier fusion for emotion recognition from speech , 2007 .

[9]  Günther Palm,et al.  A Novel Feature for Emotion Recognition in Voice Based Applications , 2007, ACII.

[10]  Björn W. Schuller,et al.  Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing , 2007, ACII.

[11]  Hynek Hermansky,et al.  Auditory Modeling in Automatic Recognition of Speech , 1996 .

[12]  Günther Palm,et al.  Real-Time Emotion Recognition from Speech Using Echo State Networks , 2008, ANNPR.

[13]  Petr Motlícek,et al.  Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.