Near-videorealistic synthetic visual speech using non-rigid appearance models

We present work towards videorealistic synthetic visual speech using non-rigid appearance models. These models are used to track a talking face enunciating a set of training sentences. The resultant parameter trajectories are used in a concatenative synthesis scheme, where samples of original data are extracted from a corpus and concatenated to form new unseen sequences. Here we explore the effect on the synthesiser output of blending several synthesis units considered similar to the desired unit. We present preliminary subjective and objective results used to judge the realism of the system.

[1]  Simon Baker,et al.  Equivalence and efficiency of image alignment algorithms , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Gavin C. Cawley,et al.  Towards a low bandwidth talking face using appearance models , 2003, Image Vis. Comput..

[3]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Gavin C. Cawley,et al.  Towards video realistic synthetic visual speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Xueyin Lin,et al.  Realistic mouth synthesis based on shape appearance dependence mapping , 2002, Pattern Recognit. Lett..

[6]  Tomaso Poggio,et al.  Trainable Videorealistic Speech Animation , 2004, FGR.

[7]  Tony Ezzat,et al.  Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[8]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[9]  C. D. Boor,et al.  CALCULATION OF THE SMOOTHING SPLINE WITH WEIGHTED ROUGHNESS MEASURE , 2001 .

[10]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[11]  Levent M. Arslan,et al.  3-D Face Point Trajectory Synthesis Using An Automatically Derived Visual Phoneme Similarity Matrix , 1998, AVSP.