Annotation of Utterances for Conversational Nonverbal Behaviors

Nonverbal behaviors play an important role in communication for both humans and social robots. However, adding contextually appropriate animations by hand is time consuming and does not scale well. Previous researchers have developed automated systems for inserting animations based on utterance text, yet these systems lack human understanding of social context and are still being improved. This work proposes a middle ground where untrained human workers label semantic information, which is input to an automatic system to produce appropriate gestures. To test this approach, untrained workers from Mechanical Turk labeled semantic information, specifically emotion and emphasis, for each utterance, which was used to automatically add animations. Videos of a robot performing the animated dialogue were rated by a second set of participants. Results showed untrained workers are capable of providing reasonable labeling of semantic information and that emotional expressions derived from the labels were rated more highly than control videos. More study is needed to determine the effects of emphasis labels.

[1]  Stefan Kopp,et al.  Towards a Common Framework for Multimodal Generation: The Behavior Markup Language , 2006, IVA.

[2]  Sara B. Kiesler,et al.  Fostering common ground in human-robot interaction , 2005, ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005..

[3]  Walaa Medhat,et al.  Sentiment analysis algorithms and applications: A survey , 2014 .

[4]  Ioannis Hatzilygeroudis,et al.  Recognizing emotions in text using ensemble of classifiers , 2016, Eng. Appl. Artif. Intell..

[5]  Zeungnam Bien,et al.  Gesture encoding and reproduction for human-robot interaction in text-to-gesture systems , 2012, Ind. Robot.

[6]  Volker Strom,et al.  Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[7]  M. Milanova,et al.  Recognition of Emotional states in Natural Human-Computer Interaction , 2008, 2008 IEEE International Symposium on Signal Processing and Information Technology.

[8]  Dana Kulic,et al.  Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots , 2009, Int. J. Soc. Robotics.

[9]  Igor S. Pandžić,et al.  Facial Gestures: Taxonomy and Application of Nonverbal, Nonemotional Facial Displays for Embodied Conversational Agents , 2007 .

[10]  Stefan Kopp,et al.  Gesture and speech in interaction: An overview , 2014, Speech Commun..

[11]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[12]  Alex Pappachen James,et al.  Detection and Analysis of Emotion From Speech Signals , 2015, ArXiv.

[13]  Hiroshi Ishiguro,et al.  Head motion during dialogue speech and nod timing control in humanoid robots , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[14]  Igor S. Pandžić,et al.  Autonomous Speaker Agent , 2004 .

[15]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[16]  Sean Andrist,et al.  Conversational Gaze Aversion for Humanlike Robots , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[17]  Z. Zenn Bien,et al.  Automatic Generation of Conversational Robot Gestures for Human-friendly Steward Robot , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[18]  P. Ekman,et al.  Pan-Cultural Elements in Facial Displays of Emotion , 1969, Science.

[19]  Irene Albrecht,et al.  Automatic Generation of Non-Verbal Facial Expressions from Speech , 2002 .

[20]  Prema Nedungadi,et al.  Hybrid Approach for Emotion Classification of Audio Conversation Based on Text and Speech Mining , 2015 .

[21]  J. Russell A circumplex model of affect. , 1980 .

[22]  Hiroshi Ishiguro,et al.  Head motions during dialogue speech and nod timing control in humanoid robots , 2010, HRI 2010.

[23]  Ramón López-Cózar,et al.  Influence of contextual information in emotion annotation for spoken dialogue systems , 2008, Speech Commun..

[24]  R. Johansson,et al.  Hand Movements , 2001 .