Deep Learning and Audio Based Emotion Recognition

While tightening and expansion of our facial muscles cause some changes called facial expressions as a reaction to the different kinds of emotional situations of our brain, similarly there are some physiological changes like tone, loudness, rhythm and intonation in our voice, too. These visual and auditory changes have a great importance for human-human interaction human-machine interaction and human-computer interaction as they include critical information about humans' emotional situations. Automatic emotion recognition systems are defined as systems that can analyze individual's emotional situation by using this distinctive information. In this study, an automatic emotion recognition system in which auditory information is analyzed and classified in order to recognize human emotions is proposed. In the study spectral features and MFCC coefficients which are commonly used for feature extraction from voice signals are firstly used, and then deep learning-based LSTM algorithm is used for classification. Suggested algorithm is evaluated by using three different audio data sets (SAVEE, RAVADES and RML).

[1]  Mandar Gilke,et al.  MFCC-based Vocal Emotion Recognition Using ANN , 2022 .

[2]  Wen Gao,et al.  Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Saikat Basu,et al.  Affect Detection from Speech using Deep Convolutional Neural Network Architecture , 2017, 2017 14th IEEE India Council International Conference (INDICON).

[4]  Vennila Ramalingam,et al.  Unsupervised speaker segmentation with residual phase and MFCC features , 2009, Expert Syst. Appl..

[5]  A. Rogier [Communication without words]. , 1971, Tijdschrift voor ziekenverpleging.

[6]  A. Mehrabian Communication without words , 1968 .

[7]  M. Shamim Hossain,et al.  Emotion recognition using deep learning approach from audio-visual emotional big data , 2019, Inf. Fusion.

[8]  Akash Roy Choudhury,et al.  Emotion Recognition from Speech Signals using Excitation Source and Spectral Features , 2018, 2018 IEEE Applied Signal Processing Conference (ASPCON).

[9]  M. Shamim Hossain,et al.  Audio-visual emotion recognition using multi-directional regression and Ridgelet transform , 2016, Journal on Multimodal User Interfaces.

[10]  James D. Edge,et al.  Audio-visual feature selection and reduction for emotion classification , 2008, AVSP.

[11]  Sajib Hasan,et al.  Emotion Detection from Speech Signals using Voting Mechanism on Classified Frames , 2019, 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST).

[12]  Yuan Liu,et al.  Short-term forecasting of rail transit passenger flow based on long short-term memory neural network , 2018, 2018 International Conference on Intelligent Rail Transportation (ICIRT).

[13]  Lijiang Chen,et al.  Prominence features: Effective emotional features for speech emotion recognition , 2018, Digit. Signal Process..

[14]  Beat Fasel,et al.  Automati Fa ial Expression Analysis: A Survey , 1999 .

[15]  Gholamreza Anbarjafari,et al.  Vocal-based emotion recognition using random forests and decision tree , 2017, International Journal of Speech Technology.

[16]  Sergio Escalera,et al.  Audio-Visual Emotion Recognition in Video Clips , 2019, IEEE Transactions on Affective Computing.

[17]  Pradip Sircar,et al.  Continuous wavelet transform based speech emotion recognition , 2016, 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS).

[18]  Arun Kulkarni,et al.  Recognizing emotions from speech , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[19]  B. Chakrabarti,et al.  Empathy and emotion recognition in people with autism, first-degree relatives, and controls , 2012, Neuropsychologia.