Evaluation of an image-based talking head with realistic facial expression and head motion

In this paper, we present an image-based talking head system that is able to synthesize flexible head motion and realistic facial expression accompanying speech, given arbitrary text input and control tags. The goal of facial animation synthesis is to generate lip synchronized and natural animations. The talking head is evaluated objectively and subjectively.The objective measurement is to measure lip synchronization by matching the closures between the synthesized sequences and the real ones, since human viewers are very sensitive to closures, and get the closures at the right time may be the most important objective criterion for providing the impression that lips and sound are synchronized.In subjective tests, facial expression is evaluated by scoring the real and synthesized videos. Head movement is evaluated by scoring the animation with flexible head motion and the animation with repeated head motion. Experimental results show that the proposed objective measurement of lip closure is one of the most significant criteria for subjective evaluation of animations. The animated facial expressions are indistinguishable from real ones subjectively. Furthermore, talking heads with flexible head motion is more realistic and lifelike than the ones with repeated head motion.

[1]  J. Cohn,et al.  All Smiles are Not Created Equal: Morphology and Timing of Smiles Perceived as Amused, Polite, and Embarrassed/Nervous , 2009, Journal of nonverbal behavior.

[2]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Jörn Ostermann,et al.  Realistic head motion synthesis for an image-based talking head , 2011, Face and Gesture 2011.

[4]  Frédéric H. Pighin,et al.  Expressive speech-driven facial animation , 2005, TOGS.

[5]  Tomaso Poggio,et al.  Trainable Videorealistic Speech Animation , 2004, FGR.

[6]  Gérard Bailly,et al.  LIPS2008: visual speech synthesis challenge , 2008, INTERSPEECH.

[7]  Sugato Chakravarty,et al.  Methodology for the subjective assessment of the quality of television pictures , 1995 .

[8]  Johanna D. Moore,et al.  Proceedings of Interspeech 2008 , 2008 .

[9]  Jörn Ostermann,et al.  Optimization of an Image-Based Talking Head System , 2009, EURASIP J. Audio Speech Music. Process..

[10]  Jörn Ostermann,et al.  Realistic facial expression synthesis for an image-based talking head , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[11]  Zhigang Deng,et al.  Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Christoph Bregler,et al.  Mood swings: expressive speech animation , 2005, TOGS.

[13]  Zhigang Deng,et al.  Data-Driven 3D Facial Animation , 2007 .

[14]  Jörn Ostermann,et al.  Lifelike talking faces for interactive services , 2003, Proc. IEEE.

[15]  Volker Strom,et al.  Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[16]  Michael Banf,et al.  Example‐Based Rendering of Eye Movements , 2009, Comput. Graph. Forum.