Modeling and evaluating beat gestures for social robots

Natural gestures are a desirable feature for a humanoid robot, as they are presumed to elicit a more comfortable interaction in people. With this aim in mind, we present in this paper a system to develop a natural talking gesture generation behavior. A Generative Adversarial Network (GAN) produces novel beat gestures from the data captured from recordings of human talking. The data is obtained without the need for any kind of wearable, as a motion capture system properly estimates the position of the limbs/joints involved in human expressive talking behavior. After testing in a Pepper robot, it is shown that the system is able to generate natural gestures during large talking periods without becoming repetitive. This approach is computationally more demanding than previous work, therefore a comparison is made in order to evaluate the improvements. This comparison is made by calculating some common measures about the end effectors’ trajectories (jerk and path lengths) and complemented by the Fréchet Gesture Distance (FGD) that aims to measure the fidelity of the generated gestures with respect to the provided ones. Results show that the described system is able to learn natural gestures just by observation and improves the one developed with a simpler motion capture system. The quantitative results are sustained by questionnaire based human evaluation.

[1]  Shuyang Lin,et al.  A Real-Time Upper-Body Robot Imitation System , 2019, International Journal of Robotics and Control.

[2]  Majid Nili Ahmadabadi,et al.  Inverse Kinematics Based Human Mimicking System using Skeletal Tracking Technology , 2017, J. Intell. Robotic Syst..

[3]  Nikolas Hemion,et al.  Generating robotic emotional body language with variational autoencoders , 2019, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII).

[4]  Elena Lazkano,et al.  Can a Social Robot Learn to Gesticulate Just by Observing Humans? , 2020, WAF.

[5]  Michail G. Lagoudakis,et al.  Complete Analytical Forward and Inverse Kinematics for the NAO Humanoid Robot , 2015, J. Intell. Robotic Syst..

[6]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[8]  J. Russell A circumplex model of affect. , 1980 .

[9]  Elena Lazkano,et al.  Singing minstrel robots, a means for improving social behaviors , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[11]  D. McNeill Hand and Mind: What Gestures Reveal about Thought , 1992 .

[12]  Sophie Sakka,et al.  Support changes during online human motion imitation by a humanoid robot using task specification , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Thierry Dutoit,et al.  Robust and automatic motion-capture data recovery using soft skeleton constraints and model averaging , 2018, PloS one.

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Francesc Alías,et al.  Gesture synthesis adapted to speech emphasis , 2014, Speech Commun..

[16]  Frank Chongwoo Park,et al.  Using Hidden Markov Models to Generate Natural Humanoid Movement , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[18]  Itziar Irigoien,et al.  Spontaneous talking gestures using Generative Adversarial Networks , 2019, Robotics Auton. Syst..

[19]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Wolfram Burgard,et al.  Automatic bone parameter estimation for skeleton tracking in optical motion capture , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Itziar Irigoien,et al.  Quantitative analysis of robot gesticulation behavior , 2020, Autonomous Robots.

[22]  Yutaka Nakamura,et al.  Long-term Motion Generation for Interactive Humanoid Robots using GAN with Convolutional Network , 2020, HRI.

[23]  Pengcheng Luo,et al.  Synchronized gesture and speech production for humanoid robots , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Shuyang Lin,et al.  Real-Time Whole-Body Imitation by Humanoid Robots and Task-Oriented Teleoperation Using an Analytical Mapping Method and Quantitative Evaluation , 2018, Applied Sciences.

[25]  Takashi Minato,et al.  Novel Speech Motion Generation by Modeling Dynamics of Human Speech Production , 2017, Front. Robot. AI.

[26]  Sriram Subramanian,et al.  Beat gesture generation rules for human-robot interaction , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[27]  Pieter Wolfert,et al.  Should beat gestures be learned or designed? A benchmarking user study , 2019 .

[28]  Ignazio Infantino,et al.  Talking with Sentiment: Adaptive Expression Generation Behavior for Social Robots , 2018, WAF.

[29]  Naoshi Kaneko,et al.  Analyzing Input and Output Representations for Speech-Driven Gesture Generation , 2019, IVA.

[30]  Mina Marmpena,et al.  Generating Robotic Emotional Body Language of Targeted Valence and Arousal with Conditional Variational Autoencoders , 2020, HRI.

[31]  L. Pérez-Mayos,et al.  Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization , 2019, Journal of Intelligent & Robotic Systems.

[32]  Hans-Peter Seidel,et al.  VNect , 2017 .

[33]  Elena Lazkano,et al.  Learning to gesticulate by observation using a deep generative approach , 2019, ICSR.

[34]  Salvatore Gaglio,et al.  An automatic system for humanoid dance creation , 2016, BICA 2016.

[35]  Nicholette D. Palmer,et al.  Novel genetic associations for blood pressure identified via gene-alcohol interaction in up to 570K individuals across multiple ancestries , 2018, PloS one.