Effect of Reverberation in Speech-based Emotion Recognition

In room environment, echo, reverberation, interference and additive noise cast the major challenges for emotional speech recognition due to degradation in quality and reliability of recorded speech signals. In this paper, we investigate effects of reverberation and noise on speech-based emotion recognition by comparing clean speech signal, adding simulated reverberant data, de-reverberant data and signal with added noise. First, we develop an emotional speech corpus of these four kinds of emotional speech data sources. Then we apply GMM-UBM framework to evaluate the performance of emotion recognition based on them. Results show that reverberation reduces emotion recognition accuracy by 5.87%, and a process of de-reverberation can largely cover this reduction.

[1]  Hiroshi Ishiguro,et al.  How about laughter? Perceived naturalness of two laughing humanoid robots , 2009, 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops.

[2]  Douglas D. O'Shaughnessy,et al.  Speech emotion recognition on mobile devices based on modulation spectral feature pooling and deep neural networks , 2017, 2017 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[3]  Björn Schuller,et al.  Emotion Recognition in the Noise Applying Large Acoustic Feature Sets , 2006 .

[4]  Björn Schuller,et al.  Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition , 2016, ACM Multimedia.

[5]  Elizabeth Shriberg,et al.  Noise and reverberation effects on depression detection from speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Björn W. Schuller,et al.  Recognition of Nonprototypical Emotions in Reverberated and Noisy Speech by Nonnegative Matrix Factorization , 2011, EURASIP J. Adv. Signal Process..

[7]  Mohan M. Trivedi,et al.  2010 International Conference on Pattern Recognition Speech Emotion Analysis in Noisy Real-World Environment , 2022 .

[8]  Fabien Ringeval,et al.  Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks , 2016, INTERSPEECH.

[9]  John H. L. Hansen,et al.  ICARUS: Source generator based real-time recognition of speech in noisy stressful and Lombard effect environments , 1995, Speech Commun..

[10]  Elliot Moore,et al.  Investigating the Robustness of Teager Energy Cepstrum Coefficients for Emotion Recognition in Noisy Conditions , 2012, FLAIRS Conference.

[11]  Diane J. Litman,et al.  Benefits and challenges of real-time uncertainty detection and adaptation in a spoken dialogue computer tutor , 2011, Speech Commun..

[12]  S. Karimi,et al.  Best features for emotional speech classification in the presence of babble noise , 2012, 20th Iranian Conference on Electrical Engineering (ICEE2012).

[13]  Farah Chenchah,et al.  Speech emotion recognition in noisy environment , 2016, 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).

[14]  F Horvath,et al.  Detecting deception: the promise and the reality of voice stress analysis. , 1982, Journal of forensic sciences.

[15]  Zdravko Kacic,et al.  Speech recognition for interaction with a robot in noisy environment , 2013 .

[16]  Ruili Wang,et al.  Ensemble methods for spoken emotion recognition in call-centres , 2007, Speech Commun..

[17]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[18]  Klaus R. Scherer,et al.  Can automatic speaker verification be improved by training the algorithms on emotional speech? , 2000, INTERSPEECH.

[19]  Alice N. Cheeran,et al.  Analysis of feature extraction techniques for improved emotion recognition in presence of additive noise , 2016, 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS).

[20]  Björn W. Schuller,et al.  Affect recognition in real-life acoustic conditions - a new perspective on feature selection , 2013, INTERSPEECH.

[21]  L. He Stress and emotion recognition in natural speech in the work and family environments , 2010 .

[22]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[23]  Chengwei Huang,et al.  Speech Emotion Recognition under White Noise , 2013 .

[24]  Walter Kellermann,et al.  Challenges in Acoustic Signal Enhancement for Human-Robot Communication , 2014, ITG Symposium on Speech Communication.

[25]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[26]  K. Scherer,et al.  Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception. , 2012, Emotion.

[27]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..