Visual speech, a trajectory in viseme space

Efficient, realistic face animation is still a challenge. A system is proposed that yields realistic animations for speech. It starts from real 3D face dynamics, observed at frame rate for thousands of points on the faces of speaking actors. When asked to animate a face it replicates the visemes that it has learned, and adds the necessary coarticulation effects. The speech animation could be based on as few as 16 modes, extracted through independent component analysis from the observed face dynamics. Rather than animating via verbatim copying the deformation fields that come with the different visemes are adapted to the shape of the given face. By localizing the face to be animated in a face space, where also the locations of the example faces are known, visemes are adapted automatically according to the relative distance with respect to these examples. © 2003 Wiley Periodicals, Inc. Int J Imaging Syst Technol 13: 74–84, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ima.10044

[1]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[2]  A. Montgomery,et al.  Physical characteristics of the lips underlying vowel lipreading performance. , 1983, The Journal of the Acoustical Society of America.

[3]  Nadia Magnenat-Thalmann,et al.  Principal components of expressive speech animation , 2001, Proceedings. Computer Graphics International 2001.

[4]  David Banks,et al.  Interactive shape metamorphosis , 1995, I3D '95.

[5]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[6]  Tony Ezzat,et al.  Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[7]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[8]  Luc Van Gool,et al.  Realistic face animation for speech , 2002, Comput. Animat. Virtual Worlds.

[9]  V. Rich Personal communication , 1989, Nature.

[10]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1992, SIGGRAPH.

[11]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[12]  E. Owens,et al.  Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[13]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[14]  Gérard Bailly,et al.  MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation , 2000, INTERSPEECH.

[15]  Eric Vatikiotis-Bateson,et al.  The moving face during speech communication , 1998 .

[16]  John R. Wright,et al.  Synthesis of Speaker Facial Movement to Match Selected Speech Sequences , 1994 .

[17]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[18]  Jun-yong Noh,et al.  Expression cloning , 2001, SIGGRAPH.

[19]  Luc J. Van Gool,et al.  Lip animation based on observed 3D speech dynamics , 2000, IS&T/SPIE Electronic Imaging.

[20]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[21]  Luc Van Gool,et al.  Face animation based on observed 3D speech dynamics , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).