Realistic speech animation based on observed 3-D face dynamics

An efficient system for realistic speech animation is proposed. The system supports all steps of the animation pipeline, from the capture or design of 3-D head models up to the synthesis and editing of the performance. This pipeline is fully 3-D, which yields high flexibility in the use of the animated character. Real detailed 3-D face dynamics, observed at video frame rate for thousands of points on the face of speaking actors, underpin the realism of the facial deformations. These are given a compact and intuitive representation via independent component analysis (ICA). Performances amount to trajectories through this ‘viseme space’. When asked to animate a face the system replicates the ‘visemes’ that it has learned, and adds the necessary co-articulation effects. Realism has been improved through comparisons with motion captured groundtruth. Faces for which no 3-D dynamics could be observed can be animated nonetheless. Their visemes are adapted automatically to their physiognomy by localising the face in a ‘face space’.

[1]  F. I. Parke June,et al.  Computer Generated Animation of Faces , 1972 .

[2]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3]  A. Montgomery,et al.  Physical characteristics of the lips underlying vowel lipreading performance. , 1983, The Journal of the Acoustical Society of America.

[4]  E. Owens,et al.  Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[5]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1992, SIGGRAPH.

[7]  John R. Wright,et al.  Synthesis of Speaker Facial Movement to Match Selected Speech Sequences , 1994 .

[8]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[9]  David Banks,et al.  Interactive shape metamorphosis , 1995, I3D '95.

[10]  Keith Waters,et al.  A coordinated muscle model for speech animation , 1995 .

[11]  Christof Traber,et al.  SVOX: the implementation of a text-to-speech system for German , 1995 .

[12]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[13]  Martin Bichsel Automatic interpolation and recognition of face images by morphing , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[14]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[15]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[16]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[17]  Henrique S. Malvar,et al.  Making Faces , 2019, Topoi.

[18]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[19]  Luc J. Van Gool,et al.  Lip animation based on observed 3D speech dynamics , 2000, IS&T/SPIE Electronic Imaging.

[20]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[21]  Jun-yong Noh,et al.  Talking faces , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[22]  Tomaso Poggio,et al.  Visual Speech Synthesis by Morphing Visemes (特集論文 NTT-MIT共同研究) , 2000 .

[23]  Nadia Magnenat-Thalmann,et al.  Principal components of expressive speech animation , 2001, Proceedings. Computer Graphics International 2001.

[24]  Ming Ouhyoung,et al.  Realistic 3D facial animation parameters from mirror-reflected multi-view video , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[25]  Luc Van Gool,et al.  Face animation based on observed 3D speech dynamics , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[26]  Jun-yong Noh,et al.  Expression cloning , 2001, SIGGRAPH 2001.

[27]  Luc Van Gool,et al.  Realistic face animation for speech , 2002, Comput. Animat. Virtual Worlds.

[28]  E. Cosatto Sample-based talking-head synthesis , 2002 .

[29]  Luc Van Gool,et al.  Generating Visemes for Realistic Animation , 2002, VMV.

[30]  Hans-Peter Seidel,et al.  Reanimating the dead: reconstruction of expressive faces from skull data , 2003, ACM Trans. Graph..

[31]  Tony Ezzat,et al.  Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[32]  Tony Ezzat,et al.  Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[33]  Tomaso Poggio,et al.  Trainable Videorealistic Speech Animation , 2004, FGR.

[34]  Frédéric H. Pighin,et al.  Synthesizing realistic facial expressions from photographs , 2005, SIGGRAPH Courses.