Realistic facial animation system for interactive services

This paper presents the optimization of parameters of talking head for web-based applications with a talking head, such as Newsreader and E-commerce, in which the realistic talking head initiates a conversation with users. Our talking head system includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates facial animation by concatenating appropriate mouth images from the database. A critical issue of the synthesis is the unit selection which selects these appropriate mouth images from the database such that they match the spoken words of the talking head. In order to achieve a realistic facial animation, the unit selection has to be optimized. Objective criteria are proposed in this paper and the Pareto optimization is used to train the unit selection. Subjective tests are carried out in our web-based evaluation system. Experimental results show that most people cannot distinguish our facial animations from real videos.

[1]  Jörn Ostermann,et al.  Lifelike talking faces for interactive services , 2003, Proc. IEEE.

[2]  Jörn Ostermann,et al.  Robust Rigid Head Motion Estimation Based on Differential Evolution , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[3]  Jörn Ostermann,et al.  Parameterization of Mouth Images by LLE and PCA for Image-Based Facial Animation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Jörn Ostermann,et al.  Robust AAM building for morphing in an image-based facial animation system , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[5]  Jörn Ostermann,et al.  Personalized Unit Selection for an Image-based Facial Animation System , 2005, 2005 IEEE 7th Workshop on Multimedia Signal Processing.

[6]  Marco Laumanns,et al.  A Tutorial on Evolutionary Multiobjective Optimization , 2004, Metaheuristics for Multiobjective Optimisation.

[7]  Jörn Ostermann,et al.  Talking faces - technologies and applications , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Lei Xie,et al.  Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.