Speech Animation Using Viseme Space

A method for realistic face animation is proposed. In particular it focuses on speech animation. When asked to animate a face it replicates the 3D ’visemes’ that it has learned from talking actors, and adds the necessary coarticulation effects. The speech animation could be based on as few as 16 modes, extracted through Independent Component Analysis from different face dynamics. The exact deformation fields that come with the different visemes are adapted by the system to take the shape of the given face into account. By localising the face to be animated in a face space, where also the locations of the neutral example faces are known, visemes are adapted automatically according to the relative distance with respect to these examples.

[1]  E. Owens,et al.  Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[2]  A. Montgomery,et al.  Physical characteristics of the lips underlying vowel lipreading performance. , 1983, The Journal of the Acoustical Society of America.

[3]  Nadia Magnenat-Thalmann,et al.  Principal components of expressive speech animation , 2001, Proceedings. Computer Graphics International 2001.

[4]  Jun-yong Noh,et al.  Expression cloning , 2001, SIGGRAPH.

[5]  Luc J. Van Gool,et al.  Lip animation based on observed 3D speech dynamics , 2000, IS&T/SPIE Electronic Imaging.

[6]  V. Rich Personal communication , 1989, Nature.

[7]  Tony Ezzat,et al.  Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[8]  Christof Traber,et al.  SVOX: the implementation of a text-to-speech system for German , 1995 .

[9]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[10]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[11]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.