Emotion recognition from speech using convolutional neural network with recurrent neural network architecture

Recognition of emotion is always a difficult problem, particularly if the recognition of emotion is done by using speech signal. Many significant research works have been done on emotion recognition using speech signal. The primary challenges of emotion recognition are choosing the emotion recognition corpora (speech database), identification of different features related to speech and an appropriate choice of a classification model. In this article we use 13 MFCC (Mel Frequency Cepstral Coefficient) with 13 velocity and 13 acceleration component as features and a CNN (Convolution Neural Network) and LSTM (Long Short Term Memory) based approach for classification. We chose Berlin Emotional Speech dataset (EmoDB) for classification purpose. We have approximately 80 percent of accuracy on test data.

[1]  Yongzhao Zhan,et al.  Speech Emotion Recognition Using CNN , 2014, ACM Multimedia.

[2]  J. Mukherjee,et al.  Affect detection in normal groups with the help of biological markers , 2015, 2015 Annual IEEE India Conference (INDICON).

[3]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[4]  Ling Guan,et al.  An investigation of speech-based human emotion recognition , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[5]  Oliver Chiu-sing Choy,et al.  An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[6]  Fabien Ringeval,et al.  Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks , 2016, INTERSPEECH.

[7]  Arnab Bag,et al.  Effects of emotion on physiological signals , 2016, 2016 IEEE Annual India Conference (INDICON).

[8]  Dong Yu,et al.  Exploring convolutional neural network structures and optimization techniques for speech recognition , 2013, INTERSPEECH.

[9]  Masato Akagi,et al.  Cross-lingual speech emotion recognition system based on a three-layer model for human perception , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[10]  Astrid Paeschke,et al.  A database of German emotional speech , 2005, INTERSPEECH.

[11]  D. Mitchell Wilkes,et al.  Acoustical properties of speech as indicators of depression and suicidal risk , 2000, IEEE Transactions on Biomedical Engineering.

[12]  Dong Yu,et al.  Speech emotion recognition using deep neural network and extreme learning machine , 2014, INTERSPEECH.

[13]  Rekha Hibare,et al.  Feature Extraction Techniques in Speech Processing: A Survey , 2014 .

[14]  M. Mahadevappa,et al.  Emotion recognition based on physiological signals using valence-arousal model , 2015, 2015 Third International Conference on Image Information Processing (ICIIP).