论文信息 - Realistic speech animation based on observed 3-D face dynamics

Realistic speech animation based on observed 3-D face dynamics

An efficient system for realistic speech animation is proposed. The system supports all steps of the animation pipeline, from the capture or design of 3-D head models up to the synthesis and editing of the performance. This pipeline is fully 3-D, which yields high flexibility in the use of the animated character. Real detailed 3-D face dynamics, observed at video frame rate for thousands of points on the face of speaking actors, underpin the realism of the facial deformations. These are given a compact and intuitive representation via independent component analysis (ICA). Performances amount to trajectories through this ‘viseme space’. When asked to animate a face the system replicates the ‘visemes’ that it has learned, and adds the necessary co-articulation effects. Realism has been improved through comparisons with motion captured groundtruth. Faces for which no 3-D dynamics could be observed can be animated nonetheless. Their visemes are adapted automatically to their physiognomy by localising the face in a ‘face space’.

[1] F. I. Parke June,et al. Computer Generated Animation of Faces , 1972 .

[2] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3] A. Montgomery,et al. Physical characteristics of the lips underlying vowel lipreading performance. , 1983, The Journal of the Acoustical Society of America.

[4] E. Owens,et al. Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[5] Alex Pentland,et al. Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6] Thaddeus Beier,et al. Feature-based image metamorphosis , 1992, SIGGRAPH.

[7] John R. Wright,et al. Synthesis of Speaker Facial Movement to Match Selected Speech Sequences , 1994 .

[8] Stephen M. Omohundro,et al. Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[9] David Banks,et al. Interactive shape metamorphosis , 1995, I3D '95.

[10] Keith Waters,et al. A coordinated muscle model for speech animation , 1995 .

[11] Christof Traber,et al. SVOX: the implementation of a text-to-speech system for German , 1995 .

[12] D. Massaro,et al. Perceiving Talking Faces , 1995 .

[13] Martin Bichsel. Automatic interpolation and recognition of face images by morphing , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[14] Tomaso Poggio,et al. Image Representations for Visual Learning , 1996, Science.

[15] Mark Steedman,et al. Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[16] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[17] Henrique S. Malvar,et al. Making Faces , 2019, Topoi.

[18] Thomas Vetter,et al. A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[19] Luc J. Van Gool,et al. Lip animation based on observed 3D speech dynamics , 2000, IS&T/SPIE Electronic Imaging.

[20] Hans Peter Graf,et al. Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[21] Jun-yong Noh,et al. Talking faces , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[22] Tomaso Poggio,et al. Visual Speech Synthesis by Morphing Visemes (特集論文 NTT-MIT共同研究) , 2000 .

[23] Nadia Magnenat-Thalmann,et al. Principal components of expressive speech animation , 2001, Proceedings. Computer Graphics International 2001.

[24] Ming Ouhyoung,et al. Realistic 3D facial animation parameters from mirror-reflected multi-view video , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[25] Luc Van Gool,et al. Face animation based on observed 3D speech dynamics , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[26] Jun-yong Noh,et al. Expression cloning , 2001, SIGGRAPH 2001.

[27] Luc Van Gool,et al. Realistic face animation for speech , 2002, Comput. Animat. Virtual Worlds.

[28] E. Cosatto. Sample-based talking-head synthesis , 2002 .

[29] Luc Van Gool,et al. Generating Visemes for Realistic Animation , 2002, VMV.

[30] Hans-Peter Seidel,et al. Reanimating the dead: reconstruction of expressive faces from skull data , 2003, ACM Trans. Graph..

[31] Tony Ezzat,et al. Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[32] Tony Ezzat,et al. Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[33] Tomaso Poggio,et al. Trainable Videorealistic Speech Animation , 2004, FGR.

[34] Frédéric H. Pighin,et al. Synthesizing realistic facial expressions from photographs , 2005, SIGGRAPH Courses.