Synchronized gesture and speech production for humanoid robots

We present a model that is capable of synchronizing expressive gestures with speech. The model, implemented on a Honda humanoid robot, can generate a full range of gesture types, such as emblems, iconic and metaphoric gestures, deictic pointing and beat gestures. Arbitrary input text is analyzed with a part-of-speech tagger and a text-to-speech engine for timing information of spoken words. In addition, style tags can be optionally added to specify the level of excitement or topic changes. The text, combined with any tags, is then processed by several grammars, one for each gesture type to produce several candidate gestures for each word of the text. The model then selects probabilistically amongst the gesture types based on the desired degree of expressivity. Once a gesture type is selected, it coincides with a particular gesture template, consisting of trajectory curves that define the gesture. Speech timing patterns and style parameters are used to modulate the shape of the curve before it sent to the whole body control system on the robot. Evaluation of the model's parameters were performed, demonstrating the ability of observers to differentiate varying levels of expressiveness, excitement and speech synchronization. Modification of gesture speed for trajectory tracking found that positive associations like happiness and excitement accompanied faster speeds, with negative associations like sadness or tiredness occurred at slower speeds.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Ravi Kiran Sarvadevabhatla,et al.  The memory game: Creating a human-robot interactive scenario for ASIMO , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Takayuki Kanda,et al.  Natural deictic communication with humanoid robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Charles F. Rose,et al.  Verbs and adverbs: multidimensional motion interpolation using radial basis functions , 1999 .

[5]  S. Levine,et al.  Gesture controllers , 2010, ACM Trans. Graph..

[6]  Matthew Stone,et al.  Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, SIGGRAPH 2004.

[7]  Norman I. Badler,et al.  The EMOTE model for effort and shape , 2000, SIGGRAPH.

[8]  Michael Gienger,et al.  Real-time collision avoidance with whole body motion control for humanoid robots , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Stefan Kopp,et al.  Lifelike Gesture Synthesis and Timing for Conversational Agents , 2001, Gesture Workshop.

[10]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[11]  D. McNeill Gesture and Thought , 2005 .

[12]  A. Takanishi,et al.  Various emotional expressions with emotion expression humanoid robot WE-4RII , 2004, IEEE Conference on Robotics and Automation, 2004. TExCRA Technical Exhibition Based..

[13]  Maurizio Mancini,et al.  Implementing Expressive Gesture Synthesis for Embodied Conversational Agents , 2005, Gesture Workshop.

[14]  Takashi Maeno,et al.  Factors of gestures of robots for smooth communication with humans , 2007, ROBOCOMM.

[15]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[16]  Richard H. Bartels,et al.  Interpolating splines with local tension, continuity, and bias control , 1984, SIGGRAPH.

[17]  Stefan Kopp,et al.  Towards Meaningful Robot Gesture , 2009, Human Centered Robot Systems, Cognition, Interaction, Technology.

[18]  Stefan Kopp,et al.  Content in Context: Generating Language and Iconic Gestures without a Gestionary , 2004 .

[19]  A. Kendon Gesture: Visible Action as Utterance , 2004 .

[20]  Matthew Stone,et al.  Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, ACM Trans. Graph..

[21]  Cynthia Breazeal,et al.  Working with robots and objects: revisiting deictic reference for achieving spatial common ground , 2006, HRI '06.

[22]  Katsushi Ikeuchi,et al.  Synthesis of Dance Performance Based on Analyses of Human Motion and Music , 2008 .

[23]  Sergey Levine,et al.  Real-time prosody-driven synthesis of body language , 2009, ACM Trans. Graph..

[24]  Hans-Peter Seidel,et al.  Annotated New Text Engine Animation Animation Lexicon Animation Gesture Profiles MR : . . . JL : . . . Gesture Generation Video Annotated Gesture Script , 2007 .

[25]  Michael F. Cohen,et al.  Verbs and Adverbs: Multidimensional Motion Interpolation , 1998, IEEE Computer Graphics and Applications.

[26]  Stefan Kopp,et al.  Synthesizing multimodal utterances for conversational agents , 2004, Comput. Animat. Virtual Worlds.