Text2Video: text-driven facial animation using MPEG-4

We present a complete system for the automatic creation of talking head video sequences from text messages. Our system converts the text into MPEG-4 Facial Animation Parameters and synthetic voice. A user selected 3D character will perform lip movements synchronized to the speech data. The 3D models created from a single image vary from realistic people to cartoon characters. A voice selection for different languages and gender as well as a pitch shift component enables a personalization of the animation. The animation can be shown on different displays and devices ranging from 3GPP players on mobile phones to real-time 3D render engines. Therefore, our system can be used in mobile communication for the conversion of regular SMS messages to MMS animations.

[1]  Jörn Ostermann,et al.  E‐Cogent: An Electronic Convincing aGENT , 2003 .

[2]  Chris Joslin,et al.  Personalized face and speech communication over the Internet , 2001, Proceedings IEEE Virtual Reality 2001.

[3]  Nadia Magnenat-Thalmann,et al.  MULTIMODAL ANIMATION SYSTEM BASED ON THE MPEG-4 STANDARD , 1999 .

[4]  Jörn Ostermann,et al.  Real-time streaming for the animation of talking faces in multiuser environments , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[5]  Thierry Dutoit,et al.  High-quality text-to-speech synthesis : an overview , 2004 .

[6]  Peter Eisert,et al.  MPEG‐4 facial animation in video analysis and synthesis , 2003, Int. J. Imaging Syst. Technol..

[7]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[8]  Jörn Ostermann,et al.  Integration of talking heads and text-to-speech synthesizers for visual TTS , 1998, ICSLP.

[9]  Peter Eisert,et al.  Analyzing Facial Expressions for Virtual Conferencing , 1998, IEEE Computer Graphics and Applications.

[10]  Jörn Ostermann,et al.  Face Animation in MPEG‐4 , 2003 .

[11]  Simon Beard,et al.  USABLE TTS FOR INTERNET SPEECH ON DEMAND , 2001 .