Modeling and synthesis of realistic visual speech in 3D

The problem of realistic face animation is a difficult one. This is hampering a further breakthrough of some high-tech domains, such as special effects in the movies, the use of 3D face models in communications, the use of avatars and likenesses in virtual reality, and the production of games with more subtle scenarios. This work attempts to improve on the current stateof-the-art in face animation, especially for the creation of highly realistic lip and speech-related motions. To that end, 3D models of faces are used and based on the latest technology speech-related 3D face motion will be learned from examples. Thus, the chapter subscribes to the surging field of image-based modelling and widens its scope to include animation. The exploitation of detailed 3D motion sequences is quite unique, thereby Modeling and Synthesis of Realistic Visual Speech in 3D 267 Copyright © 2004, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. narrowing the gap between modelling and animation. From measured 3D face deformations around the mouth area, typical motions are extracted for different “visemes.” Visemes are the basic motion patterns observed for speech and are comparable to the phonemes of auditory speech. The visemes are studied with sufficient detail to also cover natural variations and differences between individuals. Furthermore, the transition between visemes is analysed in terms of co-articulation effects, i.e., the visual blending of visemes as required for fluent, natural speech. The work presented in this chapter also encompasses the animation of faces for which no visemes have been observed and extracted. The “transplantation” of visemes to novel faces for which no viseme data have been recorded and for which only a static 3D model is available allows for the animation of faces without an extensive learning procedure for each individual.

[1]  Gérard Bailly,et al.  MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation , 2000, INTERSPEECH.

[2]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[3]  Jun-yong Noh,et al.  Expression cloning , 2001, SIGGRAPH.

[4]  李幼升,et al.  Ph , 1989 .

[5]  Frederick I. Parke,et al.  Computer generated animation of faces , 1972, ACM Annual Conference.

[6]  Ming Ouhyoung,et al.  Realistic 3D facial animation parameters from mirror-reflected multi-view video , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[7]  Eric Vatikiotis-Bateson,et al.  The moving face during speech communication , 1998 .

[8]  John R. Wright,et al.  Synthesis of Speaker Facial Movement to Match Selected Speech Sequences , 1994 .

[9]  Tony Ezzat,et al.  Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[10]  V. Rich Personal communication , 1989, Nature.

[11]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[12]  Luc Van Gool,et al.  Face animation based on observed 3D speech dynamics , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[13]  Hans-Peter Seidel,et al.  Head shop: generating animated head models with anatomical structure , 2002, SCA '02.

[14]  Christof Traber,et al.  SVOX: the implementation of a text-to-speech system for German , 1995 .

[15]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[16]  E. Cossato,et al.  Sample-based talking-head synthesis , 2002 .

[17]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[18]  Keith Waters,et al.  A coordinated muscle model for speech animation , 1995 .

[19]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[20]  E. Owens,et al.  Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[21]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1992, SIGGRAPH.

[22]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[23]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[24]  Luc Van Gool,et al.  Realistic face animation for speech , 2002, Comput. Animat. Virtual Worlds.

[25]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[26]  Luc Van Gool,et al.  Biological Motion of Speech , 2002, Biologically Motivated Computer Vision.

[27]  David Banks,et al.  Interactive shape metamorphosis , 1995, I3D '95.

[28]  A. Montgomery,et al.  Physical characteristics of the lips underlying vowel lipreading performance. , 1983, The Journal of the Acoustical Society of America.

[29]  Nadia Magnenat-Thalmann,et al.  Principal components of expressive speech animation , 2001, Proceedings. Computer Graphics International 2001.