Automatic face cloning and animation using real-time facial feature tracking and speech acquisition

We describe the components of the system used for real-time facial communication using a cloned head. We begin with describing the automatic face cloning using two orthogonal photographs of a person. The steps in this process are the face model matching and texture generation. After an introduction to the MPEG-4 parameters that we are using, we proceed with the explanation of the facial feature tracking using a video camera. The technique requires an initialization step and is further divided into mouth and eye tracking. These steps are explained in detail. We then explain the speech processing techniques used for real-time phoneme extraction and subsequent speech animation module. We conclude with the results and comments on the integration of the modules towards a complete system.

[1]  Dimitris N. Metaxas,et al.  Optical Flow Constraints on Deformable Models with Applications to Face Tracking , 2000, International Journal of Computer Vision.

[2]  Daniel Thalmann,et al.  VHD: a system for directing real-time virtual actors , 1999, The Visual Computer.

[3]  Charles Leave Neural Networks: Algorithms, Applications and Programming Techniques , 1992 .

[4]  Pierre Poulin,et al.  Real-time facial animation based upon a bank of 3D facial expressions , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[5]  Satoshi Nakamura,et al.  Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[6]  Waveforms Hisashi Wakita Direct Estimation of the Vocal Tract Shape by Inverse Filtering of Acoustic Speech , 1973 .

[7]  Jörn Ostermann,et al.  Animation of synthetic faces in MPEG-4 , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[8]  Daniel Thalmann,et al.  Towards Natural Communication in Networked Collaborative Virtual Environments , 1996 .

[9]  Thomas S. Huang,et al.  Face Detection and Recognition , 1998 .

[10]  David M. Skapura,et al.  Neural networks - algorithms, applications, and programming techniques , 1991, Computation and neural systems series.

[11]  Matthew Brand,et al.  Voice puppetry , 1999, SIGGRAPH.

[12]  Marcel J. T. Reinders,et al.  Locating facial features in image sequences using neural networks , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[13]  Fabio Lavagetto,et al.  LIP movements synthesis using time delay neural networks , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[14]  Hans Peter Graf,et al.  Sample-based synthesis of photo-realistic talking heads , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[15]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[16]  Nadia Magnenat-Thalmann,et al.  MPEG-4 compatible faces from orthogonal photos , 1999, Proceedings Computer Animation 1999.

[17]  Nadia Magnenat-Thalmann,et al.  Head Modeling from Pictures and Morphing in 3D with Image Metamorphosis Based on Triangulation , 1998, CAPTECH.

[18]  David F. McAllister,et al.  Lip synchronization of speech , 1997, AVSP.

[19]  A. Esposito,et al.  Speech driven facial animation , 2001, PUI '01.

[20]  Rainer Herpers,et al.  An Attentive Processing Strategy for the Analysis of Facial Features , 1998 .

[21]  Roberto Brunelli,et al.  Face Recognition through Geometrical Features , 1992, ECCV.