Affect-expressive hand gestures synthesis and animation

Speech and hand gestures form a composite communicative signal that boosts the naturalness and affectiveness of the communication. We present a multimodal framework for joint analysis of continuous affect, speech prosody and hand gestures towards automatic synthesis of realistic hand gestures from spontaneous speech using the hidden semi-Markov models (HSMMs). To the best of our knowledge, this is the first attempt for synthesizing hand gestures using continuous dimensional affect space, i.e., activation, valence, and dominance. We model relationships between acoustic features describing speech prosody and hand gestures with and without using the continuous affect information in speaker independent configurations and evaluate the multimodal analysis framework by generating hand gesture animations, also via objective evaluations. Our experimental studies are promising, conveying the role of affect for modeling the dynamics of speech-gesture relationship.

[1]  D. Loehr,et al.  Temporal, structural, and pragmatic synchrony between intonation and gesture , 2012 .

[2]  Guest Editorial Gesture and speech in interaction : An overview , 2013 .

[3]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[4]  A. Murat Tekalp,et al.  Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Carlos Busso,et al.  Interrelation Between Speech and Facial Gestures in Emotional Utterances: A Single Subject Study , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Carlos Busso,et al.  Generating Human-Like Behaviors Using Joint, Speech-Driven Models for Conversational Agents , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Michelle Karg,et al.  Body Movements for Affective Expression: A Survey of Automatic Recognition and Generation , 2013, IEEE Transactions on Affective Computing.

[8]  Radoslaw Niewiadomski,et al.  Constraint-Based Model for Synthesis of Multimodal Sequential Expressions of Emotions , 2011, IEEE Transactions on Affective Computing.

[9]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[10]  Daniel Thalmann,et al.  Emotional Body Expression Parameters In Virtual Human Ontology , 2006 .

[11]  Yuyu Xu,et al.  Virtual character performance from speech , 2013, SCA '13.

[12]  M. Kilian,et al.  Geodesic patterns , 2010, ACM Transactions on Graphics.

[13]  Lianhong Cai,et al.  Head and facial gestures synthesis using PAD model for an expressive talking avatar , 2014, Multimedia Tools and Applications.

[14]  Engin Erzin,et al.  Multimodal analysis of speech prosody and upper body gestures using hidden semi-Markov models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  A. Kendon Gesticulation and Speech: Two Aspects of the Process of Utterance , 1981 .

[16]  Angeliki Metallinou,et al.  Analysis of interaction attitudes using data-driven hand gesture phrases , 2014, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Maurizio Mancini,et al.  Implementing Expressive Gesture Synthesis for Embodied Conversational Agents , 2005, Gesture Workshop.

[18]  Andrea Kleinsmith,et al.  Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[19]  Norman I. Badler,et al.  The EMOTE model for effort and shape , 2000, SIGGRAPH.

[20]  Shrikanth Narayanan,et al.  The USC Creative IT Database: A Multimodal Database of Theatrical Improvisation , 2010 .

[21]  Francesc Alías,et al.  Gesture synthesis adapted to speech emphasis , 2014, Speech Commun..

[22]  Sergey Levine,et al.  Gesture controllers , 2010, SIGGRAPH 2010.