Improving Speech-Based Human Robot Interaction with Emotion Recognition

Several studies report successful results on how social assistive robots can be employed as interface in the assisted living domain. In this domain, a natural way to interact with robots is to use a speech. However, humans often use particular intonation in the voice that can change the meaning of the sentence. For this reason, a social assistive robot should have the capability to recognize the intended meaning of the utterance by reasoning on the combination of linguistic and acoustic analysis of the spoken sentence to really understand the user’s feedback. We developed a probabilistic model that is able to infer the intended meaning of the spoken sentence from the analysis of its linguistic content and from the output of a classifier able to recognise the valence and arousal of the speech prosody starting from dataset. The results showed that reasoning on the combination of the linguistic content with acoustic features of the spoken sentence was better than using only the linguistic component.