Spontaneous talking gestures using Generative Adversarial Networks

Abstract This paper presents a talking gesture generation system based on Generative Adversarial Networks, along with an evaluation of its adequateness. The talking gesture generation system produces a sequence of joint positions of the robot’s upper body which keeps in step with an uttered sentence. The suitability of the approach is demonstrated with a real robot. Besides, the motion generation method is compared with other (non-deep) generative approaches. A two-step comparison is made. On the one hand, a statistical analysis is performed over movements generated by each approach by means of Principal Coordinate Analysis. On the other hand, the robot motion adequateness is measured by calculating the end effectors’ jerk, path lengths and 3D space coverage.

[1]  J. Gratch,et al.  The Oxford Handbook of Affective Computing , 2014 .

[2]  Ce Zhang,et al.  Generative Adversarial Networks recover features in astrophysical images of galaxies beyond the deconvolution limit , 2017, ArXiv.

[3]  Frank Chongwoo Park,et al.  Using Hidden Markov Models to Generate Natural Humanoid Movement , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[5]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[6]  Mason Bretan,et al.  Emotionally expressive dynamic physical behaviors in robots , 2015, Int. J. Hum. Comput. Stud..

[7]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[8]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[9]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Hans-Peter Seidel,et al.  Annotated New Text Engine Animation Animation Lexicon Animation Gesture Profiles MR : . . . JL : . . . Gesture Generation Video Annotated Gesture Script , 2007 .

[14]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[15]  Ana Paiva,et al.  How Facial Expressions and Small Talk May Influence Trust in a Robot , 2016, ICSR.

[16]  Elena Lazkano,et al.  Singing minstrel robots, a means for improving social behaviors , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Salvatore Gaglio,et al.  An automatic system for humanoid dance creation , 2016, BICA 2016.

[18]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .