A Speech-Driven Hand Gesture Generation Method and Evaluation in Android Robots

Hand gestures commonly occur in daily dialogue interactions, and have important functions in communication. We first analyzed a multimodal human–human dialogue data and found relations between the occurrence of hand gestures and dialogue act categories. We also conducted a clustering analysis on gesture motion data, and associated text information with the gesture motion clusters through gesture function categories. Using the analysis results, we proposed a speech-driven gesture generation method by taking text, prosody, and dialogue act information into account. We then implemented a hand motion control to an android robot, and evaluated the effectiveness of the proposed gesture generation method through subjective experiments. The gesture motions generated by the proposed method were judged to be relatively natural even under the robot hardware constraints.

[1]  David McNeill Gesture: a Psycholinguistic Approach 1 , .

[2]  Hao Yan,et al.  Coordination and context-dependence in the generation of embodied conversation , 2000, INLG.

[3]  A. Murat Tekalp,et al.  Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[4]  Hiroshi Ishiguro,et al.  Analysis of relationship between head motion events and speech in dialogue conversations , 2014, Speech Communication.

[5]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[6]  Hiroshi Ishiguro,et al.  Automatic extraction of paralinguistic information using prosodic features related to F , 2008, Speech Commun..

[7]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[8]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Toyoaki Nishida,et al.  Converting Text into Agent Animations: Assigning Gestures to Text , 2004, HLT-NAACL.

[10]  Hiroshi Ishiguro,et al.  Generation of Nodding, Head tilting and Gazing for Human-Robot speech Interaction , 2013, Int. J. Humanoid Robotics.

[11]  Takashi Minato,et al.  Novel Speech Motion Generation by Modeling Dynamics of Human Speech Production , 2017, Front. Robot. AI.

[12]  Takashi Minato,et al.  Motion Analysis in Vocalized Surprise Expressions , 2017, INTERSPEECH.

[13]  Takashi Minato,et al.  Motion generation in android robots during laughing speech , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Ipke Wachsmuth,et al.  A model for the representation and processing of shape in coverbal iconic gestures , 2005 .

[15]  Hiroshi Ishiguro,et al.  Audiovisual analysis of relations between laughter types and laughter motions , 2016 .

[16]  Sotaro Kita,et al.  How representational gestures help speaking , 2000 .

[17]  Takashi Minato,et al.  Motion Analysis in Vocalized Surprise Expressions and Motion Generation in Android Robots , 2017, IEEE Robotics and Automation Letters.

[18]  Michael Neff,et al.  Don't Scratch! Self-adaptors Reflect Emotional Stability , 2011, IVA.

[19]  J. Denavit,et al.  A kinematic notation for lower pair mechanisms based on matrices , 1955 .

[20]  Stefan Kopp,et al.  GNetIc - Using Bayesian Decision Networks for Iconic Gesture Generation , 2009, IVA.

[21]  Hiroshi Ishiguro,et al.  Evaluation of formant-based lip motion generation in tele-operated humanoid robots , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[23]  Sergey Levine,et al.  Real-time prosody-driven synthesis of body language , 2009, SIGGRAPH 2009.