How Intense Are Your Words? Understanding Emotion Intensity from Speech

Speech-based emotion recognition has emerged as an important research area in the field of affective computing. Despite the multifarious developments in emotion recognition, the analysis of the intensity of the expressed emotion has mostly been unexplored. As compared to distinguishing emotions, understanding their intensities is a herculean task. The spectral characteristics of a speech signal portray important information which is necessary to distinguish emotional intensity. Line Spectral Frequency (LSF) features offer a spectral representation of the speech signal in addition to modeling the formant structure. Such features have been explored for modeling emotions from speech. However, they have not been explored for emotion intensity detection. In this paper, we explore LSF features for emotion intensity characterization. In order to make the proposed approach appropriate for low computational resources settings, we present a low dimensional version of LSF: the Band Dominance and Dynamics LSF (BDD-LSF). The proposed BDD-LSF is based on a frequency bands dominance and dynamics analysis technique of the LSF features. Such feature is capable of handling intra-class variation between different intensities involving multifarious emotional states. Using the publicly available RAVDESS dataset, we achieved the highest accuracy of 75.75% for distinguishing emotional intensities. Our system also outperforms reported works which use deep learning-based techniques.