A New Interface for Affective State Estimation and Annotation from Speech

Emotion recognition from speech has been an important research area in the recent past. In this study, the purpose is to predict the emotion annotations better for robot/agent and hence to develop a more human-like interaction between human and robot/agent. In this context, the emotion annotations of human-human interactions, acoustic feature extraction and spectrogram images are carried on human-human dyadic conversations. In the first study, the statistical summary results are matched with the corresponding annotation and emotion recognition is achieved by using Support Vector Machines. In the second study, a sliding window of a certain size and overlapping intervals are matched with the corresponding annotation and the machine is trained by using Convolutional Neural Networks. Consequently, a user interface is designed to contain the works aforementioned. Also, the models obtained are tested with the databases JESTKOD and CreativeIT on this interface, and yield promising results for human-like robot/agents.