Speech Emotion Recognition Using 2D-CNN with Data Augmentation

Speech emotion recognition becomes a challenging issue, especially interaction between human-machine. Each person expressed their emotions in various ways and the features of speech are still unclear to distinguish between existing emotions. The results of speech were caused by mental and psychological states where it is directly influenced by emotions. This research proposed speech emotion recognition based on the 2D-CNN model with Log-Mel spectrogram as the input. The proposed method of the 2D-CNN model with Log-Mel spectrogram was able to capture the significant speech signals. The experiments used the EMODB database to evaluate the proposed model with augmented data. The proposed model was able to produce higher accuracy of speech emotion than the existing models, which also used deep learning methods for recognizing emotions. The experimental results showed that the proposed model achieved an accuracy of 0.88.