Towards Interactive Agents that Infer Emotions from Voice and Context Information

Conversational agents are increasingly being used for training of social skills. One of their most important benefits is their ability to understand the user`s emotions, to be able to provide natural interaction with humans. However, to infer a conversation partner’s emotional state, humans typically make use of contextual information as well. This work proposes an architecture to extract emotions from human voice in combination with the context imprint of a particular situation. With that information, a computer system can achieve a more human-like type of interaction. The architecture presents satisfactory results. The strategy of combining 2 algorithms, one to cover ‘common cases’ and another to cover ‘borderline cases’ significantly reduces the percentage of mistakes in classification. The addition of context information also increases the accuracy in emotion inferences.

[1]  K. Scherer,et al.  The New Handbook of Methods in Nonverbal Behavior Research , 2008 .

[2]  Catherine Pelachaud,et al.  Towards a Socially Adaptive Virtual Agent , 2015, IVA.

[3]  Léon J. M. Rothkrantz,et al.  Aggression Detection in Speech Using Sensor and Semantic Information , 2012, TSD.

[4]  Uzay Kaymak,et al.  Cohen's kappa coefficient as a performance measure for feature selection , 2010, International Conference on Fuzzy Systems.

[5]  Tibor Bosse,et al.  Towards Aggression De-escalation Training with Virtual Agents: A Computational Model , 2014, HCI.

[6]  P. Ekman An argument for basic emotions , 1992 .

[7]  Björn W. Schuller,et al.  Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling , 2010, INTERSPEECH.

[8]  Mohamed Chetouani,et al.  A multi-level context-based modeling of engagement in Human-Robot Interaction , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[9]  J. Russell A circumplex model of affect. , 1980 .

[10]  Tobias Baur,et al.  Modeling User's Social Attitude in a Conversational System , 2017, Emotions and Personality in Personalized Services.

[11]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[12]  Benjamin Lok,et al.  Audio Analysis of Human/Virtual-Human Interaction , 2008, IVA.

[13]  Wojtek Kowalczyk,et al.  Detecting changing emotions in human speech by machine and humans , 2013, Applied Intelligence.

[14]  Frederik Van Broeckhoven,et al.  deLearyous : training interpersonal communication skills using unconstrained text input , 2012 .

[15]  Kallirroi Georgila,et al.  SimSensei kiosk: a virtual human interviewer for healthcare decision support , 2014, AAMAS.

[16]  Jaime C. Acosta,et al.  Achieving rapport with turn-by-turn, user-responsive emotional coloring , 2011, Speech Commun..

[17]  Albino Nogueiras,et al.  Speech emotion recognition using hidden Markov models , 2001, INTERSPEECH.

[18]  K. Scherer,et al.  Appraisal processes in emotion: Theory, methods, research. , 2001 .

[19]  Johan Jeuring,et al.  Communicate! - A Serious Game for Communication Skills - , 2015, EC-TEL.

[20]  M. Theune,et al.  Social Behaviour in Police Interviews: Relating Data to Theories , 2015 .

[21]  Elisabetta Bevacqua,et al.  Multimodal Backchannels for Embodied Conversational Agents , 2010, IVA.

[22]  J. Russell,et al.  A 12-Point Circumplex Structure of Core Affect. , 2011, Emotion.

[23]  K. Scherer,et al.  Vocal expression of affect , 2005 .

[24]  Fakhri Karray,et al.  Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[25]  Paula J. Durlach,et al.  BiLAT: A Game-Based Environment for Practicing Negotiation in a Cultural Context , 2009, Int. J. Artif. Intell. Educ..

[26]  Matthew Jensen Hays,et al.  Can Role-Play with Virtual Humans Teach Interpersonal Skills? , 2012 .

[27]  Marc Cavazza,et al.  Emotional input for character-based interactive storytelling , 2009, AAMAS.

[28]  Hung-Hsuan Huang,et al.  Embodied Conversational Agents , 2009 .

[29]  David A. van Leeuwen,et al.  Speech-based recognition of self-reported and observed emotion in a dimensional space , 2012, Speech Commun..

[30]  Elise Lavoué,et al.  Design for Teaching and Learning in a Networked World: 10th European Conference on Technology Enhanced Learning, EC-TEL 2015, Toledo, Spain, September 15-18, 2015, Proceedings , 2015, EC-TEL.