Realistic face animation for speech

Realistic face animation is especially hard as we are all experts in the perception and interpretation of face dynamics. One approach is to simulate facial anatomy. Alternatively, animation can be based on first observing the visible 3D dynamics, extracting the basic modes, and putting these together according to the required performance. This is the strategy followed by the paper, which focuses on speech. The approach follows a kind of bootstrap procedure. First, 3D shape statistics are learned from a talking face with a relatively small number of markers. A 3D reconstruction is produced at temporal intervals of 1/25 seconds. A topological mask of the lower half of the face is fitted to the motion. Principal component analysis (PCA) of the mask shapes reduces the dimension of the mask shape space. The result is twofold. On the one hand, the face can be animated; in our case it can be made to speak new sentences. On the other hand, face dynamics can be tracked in 3D without markers for performance capture. Copyright © 2002 John Wiley & Sons, Ltd.

[1]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[2]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[3]  C. G. Fisher,et al.  Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.

[4]  Frederick I. Parke,et al.  Computer generated animation of faces , 1972, ACM Annual Conference.

[5]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[6]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[7]  B. Repp Phonetic trading relations and context effects: new experimental evidence for a speech mode of perception. , 1982, Psychological bulletin.

[8]  N. M. Brooke,et al.  Analysis, synthesis, and perception of visible articulatory movements , 1983 .

[9]  A. Montgomery,et al.  Physical characteristics of the lips underlying vowel lipreading performance. , 1983, The Journal of the Acoustical Society of America.

[10]  E. Owens,et al.  Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[11]  R. Campbell The lateralization of lip-read sounds: A first look , 1986, Brain and Cognition.

[12]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[13]  Demetri Terzopoulos,et al.  On Matching Deformable Models to Images , 1987, Topical Meeting on Machine Vision.

[14]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[15]  V. Rich Personal communication , 1989, Nature.

[16]  Ralf Kories,et al.  Stereo Ranging with Verging Cameras , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  D. Hawkins Multivariate Statistics: A Practical Approach , 1990 .

[18]  Lance Williams,et al.  Performance-driven facial animation , 1990, SIGGRAPH.

[19]  Victor Zue,et al.  From Speech Recognition to Spoken Language Understanding , 1990, NIPS.

[20]  David J. Kriegman,et al.  On Recognizing and Positioning Curved 3-D Objects from Image Contours , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Aaron F. Bobick,et al.  The direct computation of height from shading , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  L. Vistnes The Artist??s Complete Guide to Facial Expression , 1992 .

[23]  Gérard G. Medioni,et al.  Object modelling by registration of multiple range images , 1992, Image Vis. Comput..

[24]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1992, SIGGRAPH.

[25]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[26]  Jayant Shah A nonlinear diffusion model for discontinuous disparity and half-occlusions in stereo , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[27]  J. S. Mason,et al.  A review of robust techniques for the analysis of degraded speech , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[28]  Luc Van Gool,et al.  Determination of Optical Flow and its Discontinuities using Non-Linear Diffusion , 1994, ECCV.

[29]  John R. Wright,et al.  Synthesis of Speaker Facial Movement to Match Selected Speech Sequences , 1994 .

[30]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[31]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[32]  Thomas S. Huang,et al.  Analysis-based facial expression synthesis , 1994, Proceedings of 1st International Conference on Image Processing.

[33]  David Banks,et al.  Interactive shape metamorphosis , 1995, I3D '95.

[34]  Les E. Atlas,et al.  The challenge of spoken language systems: research directions for the nineties , 1995, IEEE Trans. Speech Audio Process..

[35]  Keith Waters,et al.  A coordinated muscle model for speech animation , 1995 .

[36]  Christof Traber SVOX: the implementation of a text-to-speech system for German , 1995 .

[37]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[38]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[39]  Shree K. Nayar,et al.  Telecentric Optics for Computational Vision , 1996, ECCV.

[40]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[41]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[42]  Dimitris N. Metaxas,et al.  The integration of optical flow and deformable models with applications to human face shape and motion estimation , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Takeo Kanade,et al.  Recovery of dynamic scene structure from multiple image sequences , 1996, 1996 IEEE/SICE/RSJ International Conference on Multisensor Fusion and Integration for Intelligent Systems (Cat. No.96TH8242).

[44]  Luc Van Gool,et al.  Active acquisition of 3D shape for moving objects , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[45]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[46]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[47]  Aapo Hyvärinen,et al.  New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit , 1997, NIPS.

[48]  David J. Fleet,et al.  Learning parameterized models of image motion , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[50]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[52]  Luc Van Gool,et al.  Reading between the lines—a method for extracting dynamic 3D with texture , 1997, VRST '97.

[53]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[54]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[55]  Pierre Poulin,et al.  Real-time facial animation based upon a bank of 3D facial expressions , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[56]  Alex Pentland,et al.  3D modeling and tracking of human lip motions , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[57]  Thomas Mathew,et al.  Mathematical Tools for Applied Multivariate Analysis , 1998 .

[58]  Heinz Hügli,et al.  Multi-feature matching algorithm for free-form 3D surface registration , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[59]  Eric Vatikiotis-Bateson,et al.  The moving face during speech communication , 1998 .

[60]  Henrique S. Malvar,et al.  Making Faces , 2019, Topoi.

[61]  Pascal Fua,et al.  From Regular Images to Animated Heads: A Least Squares Approach , 1998, ECCV.

[62]  Nadia Magnenat-Thalmann,et al.  Head Modeling from Pictures and Morphing in 3D with Image Metamorphosis Based on Triangulation , 1998, CAPTECH.

[63]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[64]  Jörn Ostermann,et al.  User evaluation: Synthetic talking faces for interactive services , 1999, The Visual Computer.

[65]  David Salesin,et al.  Resynthesizing facial animation through 3D model-based tracking , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[66]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[67]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[68]  Sung Yong Shin,et al.  A hierarchical approach to interactive motion editing for human-like figures , 1999, SIGGRAPH.

[69]  Scott A. King,et al.  A Parametric Tongue Model for Animated Speech , 2000, Computer Animation and Simulation.

[70]  Scott A. King,et al.  An anatomically-based 3D parametric lip model to support facial animation and synchronized speech , 2000 .

[71]  Luc J. Van Gool,et al.  Lip animation based on observed 3D speech dynamics , 2000, IS&T/SPIE Electronic Imaging.

[72]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[73]  Evangelos Kokkevis,et al.  Skinning Characters using Surface Oriented Free-Form Deformations , 2000, Graphics Interface.

[74]  Gregor A. Kalberer,et al.  Image-Based 3D Modeling: Modeling from Reality , 2000, Confluence of Computer Vision and Computer Graphics.

[75]  Gérard Bailly,et al.  MOTHER: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation , 2000, INTERSPEECH.

[76]  Nadia Magnenat-Thalmann,et al.  Principal components of expressive speech animation , 2001, Proceedings. Computer Graphics International 2001.

[77]  Ming Ouhyoung,et al.  Realistic 3D facial animation parameters from mirror-reflected multi-view video , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[78]  Luc Van Gool,et al.  Face animation based on observed 3D speech dynamics , 2001, Proceedings Computer Animation 2001. Fourteenth Conference on Computer Animation (Cat. No.01TH8596).

[79]  E. Cossato,et al.  Sample-based talking-head synthesis , 2002 .

[80]  Ulrich Neumann,et al.  CoArt: coarticulation region analysis for control of 2D characters , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[81]  Luc Van Gool,et al.  Generating Visemes for Realistic Animation , 2002, VMV.

[82]  R. Campbell,et al.  Seeing Is Perceiving, Even When It Is Speech@@@Hearing by Eye II: Advances in the Psychology of Speechreading and Auditory-Visual Speech , 2002 .

[83]  N. Badler,et al.  Eyes Alive Eyes Alive Eyes Alive Figure 1: Sample Images of an Animated Face with Eye Movements , 2022 .

[84]  Luc Van Gool,et al.  Biological Motion of Speech , 2002, Biologically Motivated Computer Vision.

[85]  Hans-Peter Seidel,et al.  Head shop: generating animated head models with anatomical structure , 2002, SCA '02.

[86]  Luc Van Gool,et al.  Modeling shapes and textures from images: new frontiers , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[87]  Luc Van Gool,et al.  Speech Animation Using Viseme Space , 2002, VMV.

[88]  Gregor Arthur Kalberer Realistic face animation for speech , 2003 .

[89]  Li Zhang,et al.  Spacetime stereo: shape recovery for dynamic scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[90]  Luc Van Gool,et al.  Visual speech, a trajectory in viseme space , 2003, Int. J. Imaging Syst. Technol..

[91]  Pascal Fua,et al.  Regularized Bundle-Adjustment to Model Heads from Image Sequences without Calibration Data , 2000, International Journal of Computer Vision.

[92]  Tony Ezzat,et al.  Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[93]  Tony Ezzat,et al.  Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[94]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.