Audio driven facial animation for audio-visual reality

In this paper, we demonstrate a morphing based automated audio driven facial animation system. Based on an incoming audio stream, a face image is animated with full lip synchronization and expression. An animation sequence using optical flow between visemes is constructed, given an incoming audio stream and still pictures of a face speaking different visemes. Rules are formulated based on coarticulation and the duration of a viseme to control the continuity in terms of shape and extent of lip opening. In addition to this new viseme-expression combinations are synthesized to be able to generate animations with new facial expressions. Finally various applications of this system are discussed in the context of creating audio-visual reality.

[1]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[2]  Alex Pentland,et al.  Coding, Analysis, Interpretation, and Recognition of Facial Expressions , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Michael J. Black,et al.  Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[4]  Keith Waters,et al.  Computer facial animation , 1996 .

[5]  Gary S. Katz,et al.  Bimodal expression of emotion by face and voice , 1998, MULTIMEDIA '98.

[6]  John Yen,et al.  Emotionally expressive agents , 1999, Proceedings Computer Animation 1999.

[7]  D. Massaro Perceiving talking faces: from speech perception to a behavioral principle , 1999 .

[8]  Tony Ezzat,et al.  MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[9]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[10]  Chalapathy Neti,et al.  Translingual visual speech synthesis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[11]  Thomas S. Huang,et al.  Estimating three-dimensional motion parameters of a rigid planar patch , 1981 .

[12]  Keith Waters,et al.  Computer Facial Animation, Second Edition , 1996 .

[13]  V C Tartter,et al.  Hearing smiles and frowns in normal and whisper registers. , 1994, The Journal of the Acoustical Society of America.