An integrated framework for face modeling, facial motion analysis and synthesis

This paper presents an integrated framework for face modeling, facial motion analysis and synthesis. This framework systematically addresses three closely related research issues: (1) selecting a quantitative visual representation for face modeling and face animation; (2) automatic facial motion analysis based on the same visual representation; and (3) speech to facial coarticulation modeling. The framework provides a guideline for methodically building a face modeling and animation system. The systematicness of the framework is reflected by the links among its components, whose details are presented. Based on this framework, we improved a face modeling and animation system, called the iFACE system [4]. The final system provides functionalities for customizing a generic face model for an individual, text driven face animation, off-line speech driven face animation, and real-time speech driven face animation.

[1]  Thomas S. Huang,et al.  Explanation-based facial motion tracking using a piecewise Bezier volume deformation model , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[2]  Jörn Ostermann,et al.  User evaluation: Synthetic talking faces for interactive services , 1999, The Visual Computer.

[3]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[6]  Tsuhan Chen,et al.  Networked Intelligent Collaborative Environment (NetICE) , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[7]  Henrique S. Malvar,et al.  Making Faces , 2019, Topoi.

[8]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[9]  Hiroshi Harashima,et al.  A Media Conversion from Speech to Facial Image for Intelligent Man-Machine Interface , 1991, IEEE J. Sel. Areas Commun..

[10]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[11]  Emanuele Trucco,et al.  Geometric Invariance in Computer Vision , 1995 .

[12]  N. Badler,et al.  Linguistic Issues in Facial Animation , 1991 .

[13]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[14]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[15]  Nadia Magnenat-Thalmann,et al.  MPEG-4 based animation with face feature tracking , 1999, Computer Animation and Simulation.

[16]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[17]  Kiyoharu Aizawa,et al.  Analysis and synthesis of facial image sequences in model-based image coding , 1994, IEEE Trans. Circuits Syst. Video Technol..

[18]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[19]  Kiyoharu Aizawa,et al.  Model-based image coding , 1994, Other Conferences.

[20]  Demetri Terzopoulos,et al.  Analysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Models , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[22]  Michael T. Chan,et al.  Automatic lip model extraction for constrained contour-based tracking , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[23]  Jacques de Villiers,et al.  New tools for interactive speech and language training: Using animated conversational agents in the classrooms of profoundly deaf children , 1999 .

[24]  Keith Waters,et al.  Computer facial animation , 1996 .

[25]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[26]  Lawrence Sirovich,et al.  Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Andrew Blake,et al.  Accurate, real-time, unadorned lip tracking , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[28]  Thomas S. Huang,et al.  Final Report To NSF of the Planning Workshop on Facial Expression Understanding , 1992 .

[29]  F. Lavagetto,et al.  Converting speech into lip movements: a multimedia telephone for hard of hearing people , 1995 .

[30]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1992, SIGGRAPH.

[31]  S. P. Mudur,et al.  Three-dimensional computer vision: a geometric viewpoint , 1993 .

[32]  Speech dialogue with facial displays , 1994, CHI '94.

[33]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[34]  Brian Wyvill,et al.  Speech and expression: a computer solution to face animation , 1986 .

[35]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[36]  Thoms M. Levergood,et al.  DEC face: an automatic lip-synchronization algorithm for synthetic faces , 1993 .

[37]  Anders Löfqvist,et al.  Speech as Audible Gestures , 1990 .

[38]  Alex Pentland,et al.  3D modeling and tracking of human lip motions , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[39]  Keith Waters,et al.  A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[40]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Demetri Terzopoulos,et al.  Techniques for Realistic Facial Modeling and Animation , 1991 .

[42]  Shigeo Morishima,et al.  Real-time Talking Head Driven by Voice and its Application to Communication and Entertainment , 1998, AVSP.

[43]  Michael Isard,et al.  Learning to Track the Visual Motion of Contours , 1995, Artif. Intell..

[44]  Lance Williams,et al.  Performance-driven facial animation , 1990, SIGGRAPH.

[45]  Daniel Thalmann,et al.  Simulation of Facial Muscle Actions Based on Rational Free Form Deformations , 1992, Comput. Graph. Forum.

[46]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Kiyoharu Aizawa,et al.  An intelligent facial image coding driven by speech and phoneme , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[48]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[49]  John Lewis,et al.  Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.

[50]  Xueyin Lin,et al.  Mouth motion learning and generating from observation , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[51]  James M. Rehg,et al.  Computer Vision for Human–Machine Interaction: Visual Sensing of Humans for Active Public Interfaces , 1998 .

[52]  Tsuhan Chen,et al.  Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[53]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Samy Bengio,et al.  Automatic speech recognition using dynamic bayesian networks with both acoustic and articulatory variables , 2000, INTERSPEECH.

[55]  Gregory M. Nielson,et al.  Scattered data modeling , 1993, IEEE Computer Graphics and Applications.

[56]  Thomas S. Huang,et al.  iFACE: A 3D Synthetic Talking Face , 2001, Int. J. Image Graph..

[57]  Lionel Revéret,et al.  A New 3D Lip Model for Analysis and Synthesis of Lip Motion In Speech Production , 1998, AVSP.

[58]  Marie-Luce Viaud,et al.  Facial animation with wrinkles , 1992 .

[59]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[60]  Jonas Beskow,et al.  Picture my voice: Audio to visual speech synthesis using artificial neural networks , 1999, AVSP.

[61]  Peter Eisert,et al.  Model-aided coding: a new approach to incorporate facial animation into motion-compensated video coding , 2000, IEEE Trans. Circuits Syst. Video Technol..

[62]  Saburo Tsuji,et al.  Recognizing human facial expressions in a potential field , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[63]  Alex Pentland,et al.  A three-dimensional model of human lip motions trained from video , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[64]  R. RaoGeorgia,et al.  EXPLOITING AUDIO-VISUAL CORRELATION IN CODING OFTALKING HEAD SEQUENCESRam , 1996 .

[65]  Andrew Blake,et al.  Affine-invariant contour tracking with automatic control of spatiotemporal scale , 1993, 1993 (4th) International Conference on Computer Vision.

[66]  I. Jolliffe Principal Component Analysis , 2002 .

[67]  Christer Carlsson,et al.  DIVE - A platform for multi-user virtual environments , 1993, Comput. Graph..

[68]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[69]  Nadia Magnenat-Thalmann,et al.  Lip synchronization using linear predictive analysis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[70]  Frederic I. Parke,et al.  A parametric model for human faces. , 1974 .

[71]  Hani Yehia,et al.  Quantitative association of vocal-tract and facial behavior , 1998, Speech Commun..

[72]  Fabio Lavagetto,et al.  LIP movements synthesis using time delay neural networks , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).