An Unsupervised frame Selection Technique for Robust Emotion Recognition in Noisy Speech

Automatic emotion recognition with good accuracy has been demonstrated for clean speech, but the performance deteriorates quickly when speech is contaminated with noise. In this paper, we propose a front-end voice activity detector (VAD)-based unsupervised method to select the frames with a relatively better signal to noise ratio (SNR) in the spoken utterances. Then we extract a large number of statistical features from low-level audio descriptors for the purpose of emotion recognition by using state-of-art classifiers. Extensive experimentation on two standard databases contaminated with 5 types of noise (Babble, F-16, Factory, Volvo, and HF-channel) from the Noisex-92 noise database at 5 different SNR levels (0, 5, 10, 15, 20dB) have been carried out. While performing all experiments to classify emotions both at the categorical and the dimensional spaces, the proposed technique outperforms a Recurrent Neural Network (RNN)-based VAD across all 5 types and levels of noises, and for both the databases.

[1]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[2]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[3]  Chengwei Huang,et al.  Speech Emotion Recognition under White Noise , 2013 .

[4]  Björn W. Schuller,et al.  Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  D. Govind,et al.  Identifying Issues in Estimating Parameters from Speech Under Lombard Effect , 2017, SIRS.

[6]  Elisabeth André,et al.  Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[7]  Lukás Burget,et al.  Brno University of Technology system for Interspeech 2009 emotion challenge , 2009, INTERSPEECH.

[8]  Tsang-Long Pao,et al.  Comparison of Several Classifiers for Emotion Recognition from Noisy Mandarin Speech , 2007, Third International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2007).

[9]  Shiqing Zhang,et al.  Robust emotion recognition in noisy speech via sparse representation , 2013, Neural Computing and Applications.

[10]  Mohan M. Trivedi,et al.  2010 International Conference on Pattern Recognition Speech Emotion Analysis in Noisy Real-World Environment , 2022 .

[11]  Björn Schuller,et al.  Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition , 2016, ACM Multimedia.

[12]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[13]  Lukasz Juszkiewicz,et al.  Improving Noise Robustness of Speech Emotion Recognition System , 2013, IDC.

[14]  ChunChen,et al.  EMOTION RECOGNITION FROM NOISY SPEECH , 2006 .

[15]  Maja J. Mataric,et al.  A Framework for Automatic Human Emotion Classification Using Emotion Profiles , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Rajib Rana,et al.  Emotion Classification from Noisy Speech - A Deep Learning Approach , 2016, ArXiv.

[17]  Inma Hernáez,et al.  Combining spectral and prosodic information for emotion recognition in the interspeech 2009 emotion challenge , 2009, INTERSPEECH.

[18]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[19]  Farah Chenchah,et al.  Speech emotion recognition in noisy environment , 2016, 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP).

[20]  Björn W. Schuller,et al.  Towards More Reality in the Recognition of Emotional Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[21]  Björn Schuller,et al.  Emotion Recognition in the Noise Applying Large Acoustic Feature Sets , 2006 .