Gesture controllers

We introduce gesture controllers, a method for animating the body language of avatars engaged in live spoken conversation. A gesture controller is an optimal-policy controller that schedules gesture animations in real time based on acoustic features in the user's speech. The controller consists of an inference layer, which infers a distribution over a set of hidden states from the speech signal, and a control layer, which selects the optimal motion based on the inferred state distribution. The inference layer, consisting of a specialized conditional random field, learns the hidden structure in body language style and associates it with acoustic features in speech. The control layer uses reinforcement learning to construct an optimal policy for selecting motion clips from a distribution over the learned hidden states. The modularity of the proposed method allows customization of a character's gesture repertoire, animation of non-human characters, and the use of additional inputs such as speech recognition or direct user control.

[1]  Z. Popovic,et al.  Near-optimal character animation with continuous control , 2007, ACM Trans. Graph..

[2]  Irene Albrecht,et al.  Automatic Generation of Non-Verbal Facial Expressions from Speech , 2002 .

[3]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[4]  J. Lannoy,et al.  Gestures and Speech: Psychological Investigations , 1991 .

[5]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[6]  Maurizio Mancini,et al.  Formational parameters and adaptive prototype instantiation for MPEG-4 compliant gesture synthesis , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[7]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  A. Murat Tekalp,et al.  Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mark Steedman,et al.  Animated conversation: rule-based generation of facial expression, gesture & spoken intonation for multiple conversational agents , 1994, SIGGRAPH.

[10]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[11]  Zhigang Deng,et al.  Data-Driven 3D Facial Animation , 2007 .

[12]  Matthew Stone,et al.  Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, ACM Trans. Graph..

[13]  Michael Neff,et al.  An annotation scheme for conversational gestures: how to economically capture timing and form , 2007, Lang. Resour. Evaluation.

[14]  Norman I. Badler,et al.  Acquiring and validating motion qualities from live limb gestures , 2005, Graph. Model..

[15]  John Hart,et al.  ACM Transactions on Graphics , 2004, SIGGRAPH 2004.

[16]  J. Newlove Laban for Actors and Dancers , 1993 .

[17]  Hans-Peter Seidel,et al.  Annotated New Text Engine Animation Animation Lexicon Animation Gesture Profiles MR : . . . JL : . . . Gesture Generation Video Annotated Gesture Script , 2007 .

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Gwenn Englebienne,et al.  A probabilistic model for generating realistic lip movements from speech , 2007, NIPS.

[20]  Maurizio Mancini,et al.  Implementing Expressive Gesture Synthesis for Embodied Conversational Agents , 2005, Gesture Workshop.

[21]  Ray L. Birdwhistell,et al.  Introduction to kinesics : an annotation system for analysis of body motion and gesture , 1952 .

[22]  Abeer Alwan,et al.  Acoustically-Driven Talking Face Synthesis using Dynamic Bayesian Networks , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[23]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[24]  Jeremy N. Bailenson,et al.  Transformed Social Interaction: Decoupling Representation from Behavior and Form in Collaborative Virtual Environments , 2004, Presence: Teleoperators & Virtual Environments.

[25]  Christoph Bregler,et al.  Mood swings: expressive speech animation , 2005, TOGS.

[26]  Marc Schröder,et al.  Expressive Speech Synthesis: Past, Present, and Possible Futures , 2009, Affective Information Processing.

[27]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[28]  Harry Shum,et al.  Learning dynamic audio-visual mapping with input-output Hidden Markov models , 2006, IEEE Trans. Multim..

[29]  Rashid Ansari,et al.  Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures , 2002, 2002 11th European Signal Processing Conference.

[30]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  D. Loehr Aspects of rhythm in gesture and speech , 2007 .

[32]  Ken Perlin,et al.  Improv: a system for scripting interactive actors in virtual worlds , 1996, SIGGRAPH.

[33]  M. D. Meijer The contribution of general features of body movement to the attribution of emotions , 1989 .

[34]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[35]  Stefan Kopp,et al.  Synthesizing multimodal utterances for conversational agents , 2004, Comput. Animat. Virtual Worlds.

[36]  A. Kendon Gesture: Visible Action as Utterance , 2004 .