Accurate Synchronization of Gesture and Speech for Conversational Agents using Motion Graphs

Multimodal representation of conversational agents requires accurate synchronization of gesture and speech. For this purpose, we investigate the important issues in synchronization as a practical guideline for our algorithm design through a precedent case study and propose a two-step synchronization approach. Our case study reveals that two issues (i.e. duration and timing) play an important role in the manual synchronizing of gesture with speech. Considering the synchronization problem as a motion synthesis problem instead of a behavior scheduling problem used in the conventional methods, we use a motion graph technique with constraints on gesture structure for coarse synchronization in a first step and refine this further by shifting and scaling the motion in a second step. This approach can successfully synchronize gesture and speech with respect to both duration and timing. We have confirmed that our system makes the creation of attractive content easier than manual creation of equal quality. In addition, subjective evaluation has demonstrated that the proposed approach achieves more accurate synchronization and higher motion quality than the state-of-the-art method.

[1]  Matthew Stone,et al.  Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, ACM Trans. Graph..

[2]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[3]  J. Russell A circumplex model of affect. , 1980 .

[4]  Stefan Kopp,et al.  Towards a Common Framework for Multimodal Generation: The Behavior Markup Language , 2006, IVA.

[5]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[6]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH Classes.

[7]  Catherine Pelachaud,et al.  Expressive Body Animation Pipeline for Virtual Agent , 2012, IVA.

[8]  Pengcheng Luo,et al.  Synchronized gesture and speech production for humanoid robots , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[10]  D. McNeill So you think gestures are nonverbal , 1985 .

[11]  Hans-Peter Seidel,et al.  Annotated New Text Engine Animation Animation Lexicon Animation Gesture Profiles MR : . . . JL : . . . Gesture Generation Video Annotated Gesture Script , 2007 .

[12]  Yuyu Xu,et al.  Virtual character performance from speech , 2013, SCA '13.

[13]  Anton Nijholt,et al.  A dialogue agent for navigation support in virtual reality , 2001, CHI Extended Abstracts.

[14]  Jianfeng Xu,et al.  Motion synthesis for synchronizing with streaming music by segment-based search on metadata motion graphs , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[15]  Björn Granström,et al.  Design strategies for a virtual language tutor , 2004, INTERSPEECH.

[16]  Lee M. Miller,et al.  Behavioral/systems/cognitive Perceptual Fusion and Stimulus Coincidence in the Cross- Modal Integration of Speech , 2022 .

[17]  K. Chang,et al.  Embodiment in conversational interfaces: Rea , 1999, CHI '99.

[18]  Radoslaw Niewiadomski,et al.  Greta: an interactive expressive ECA system , 2009, AAMAS.

[19]  Bobby Bodenheimer,et al.  An evaluation of a cost metric for selecting transitions between motion segments , 2003, SCA '03.

[20]  D. McNeill Gesture and Thought , 2005 .

[21]  Okan Arikan,et al.  Interactive motion generation from examples , 2002, ACM Trans. Graph..

[22]  Ken Shoemake,et al.  Animating rotation with quaternion curves , 1985, SIGGRAPH.

[23]  Igor S. Pandzic,et al.  Multimodal behavior realization for embodied conversational agents , 2011, Multimedia Tools and Applications.