Speech-driven eyebrow motion synthesis with contextual Markovian models

Nonverbal communicative behaviors during speech are important to model a virtual agent able to sustain a natural and lively conversation with humans. We investigate statistical frameworks for learning the correlation between speech prosody and eyebrow motion features. Such methods may be used to synthesize automatically accurate eyebrow movements from synchronized speech.

[1]  Harry Shum,et al.  Learning dynamic audio-visual mapping with input-output Hidden Markov models , 2006, IEEE Trans. Multim..

[2]  Sergey Levine,et al.  Gesture controllers , 2010, SIGGRAPH 2010.

[3]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[4]  Igor S. Pandzic,et al.  MPEG-4 Facial Animation , 2002 .

[5]  Los Angeles,et al.  Acoustically-Driven Talking Face Animations Using Dynamic Bayesian Networks , 2008 .

[6]  Zhigang Deng,et al.  Natural head motion synthesis driven by acoustic prosodic features , 2005, Comput. Animat. Virtual Worlds.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Etienne de Sevin,et al.  GRETA: Towards an interactive conversational virtual Companion , 2010 .

[9]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Carlos Busso,et al.  Generating Human-Like Behaviors Using Joint, Speech-Driven Models for Conversational Agents , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Fabio Lavagetto,et al.  ANALYSIS FOR REALISTIC MOTION SYNTHESIS OF 3 D HEAD MODELS , 2001 .

[12]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[13]  Dwight L. Bolinger,et al.  Intonation and Its Uses: Melody in Grammar and Discourse , 1989 .

[14]  Jianxia Xue,et al.  Acoustically-driven talking face animations using dynamic Bayesian networks , 2008 .

[15]  Takaaki Kuratate,et al.  Audio-visual synthesis of talking faces from speech production correlates. , 1999 .

[16]  Luc Van Gool,et al.  A 3-D Audio-Visual Corpus of Affective Communication , 2010, IEEE Transactions on Multimedia.

[17]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[18]  Zhigang Deng,et al.  Live Speech Driven Head-and-Eye Motion Generators , 2012, IEEE Transactions on Visualization and Computer Graphics.

[19]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Stacy Marsella,et al.  How to Train Your Avatar: A Data Driven Approach to Gesture Generation , 2011, IVA.

[21]  Sergey Levine,et al.  Real-time prosody-driven synthesis of body language , 2009, SIGGRAPH 2009.

[22]  A. Murat Tekalp,et al.  Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Zhigang Deng,et al.  Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Takaaki Kuratate,et al.  Linking facial animation, head motion and speech acoustics , 2002, J. Phonetics.

[25]  P. Ekman Emotions Revealed: Recognizing Faces and Feelings to Improve Communication and Emotional Life , 2003 .

[26]  Junichi Yamagishi,et al.  Speech driven head motion synthesis based on a trajectory model , 2007, SIGGRAPH '07.

[27]  Algirdas Pakstas,et al.  MPEG-4 Facial Animation: The Standard,Implementation and Applications , 2002 .

[28]  Thierry Artières,et al.  Contextual Hidden Markov Models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).