Efficient Modeling of Long Temporal Contexts for Continuous Emotion Recognition

Continuous emotion recognition is a challenging task due to its difficulty in modeling long-term contexts dependencies. Prior researches have exploited emotional temporal contexts from two perspectives, which are based on feature representations and emotional models. In this paper, we explore the model based approaches for continuous emotion recognition. Specifically, three temporal models including LSTM, TDNN and multi-head attention models are utilized to learn long-term contexts dependencies based on short-term feature representations. The temporal information learned by the temporal models allows the network to more easily exploit the slow changing dynamics between emotional states. Our experimental results demonstrate that the temporal models can model emotional long-term dynamic information effectively. Multi-head attention model achieves best performance among three models and multi-model combination models further improve the performance of continuous emotion recognition significantly.