Towards video realistic synthetic visual speech

In this paper we present initial work towards a video-realistic visual speech synthesiser based on statistical models of shape and appearance. A synthesised image sequence corresponding to an utterance is formed by concatenation of synthesis units (in this case phonemes) from a pre-recorded corpus of training data. A smoothing spline is applied to the concatenated parameters to ensure smooth transitions between frames and the resultant parameters applied to the model—early results look promising.

[1]  Daniel Thalmann,et al.  Models and Techniques in Computer Animation , 2014, Computer Animation Series.

[2]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[3]  Bertrand Le Goff,et al.  A text-to-audiovisual-speech synthesizer for French , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Tony Ezzat,et al.  MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[5]  D. Massaro,et al.  Perceiving Talking Faces , 1995 .

[6]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[7]  Keith Waters,et al.  Computer facial animation , 1996 .

[8]  Norman I. Badler,et al.  Animating facial expressions , 1981, SIGGRAPH '81.

[9]  Keith Waters,et al.  A muscle model for animation three-dimensional facial expression , 1987, SIGGRAPH.

[10]  Tony Ezzat,et al.  Videorealistic talking faces: a morphing approach , 1997, AVSP.

[11]  Levent M. Arslan,et al.  Speech driven 3-d face point trajectory synthesis algorithm , 1998, ICSLP.

[12]  Gavin C. Cawley,et al.  Towards a low bandwidth talking face using appearance models , 2003, Image Vis. Comput..

[13]  Raymond D. Kent,et al.  Coarticulation in recent speech production models , 1977 .

[14]  C. D. Boor,et al.  CALCULATION OF THE SMOOTHING SPLINE WITH WEIGHTED ROUGHNESS MEASURE , 2001 .

[15]  Marie-Paule Cani,et al.  3D models of the lips for realistic speech animation , 1996, Proceedings Computer Animation '96.

[16]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[17]  Zicheng Liu,et al.  Rapid modeling of animated faces from video , 2001, Comput. Animat. Virtual Worlds.

[18]  Hans Peter Graf,et al.  Sample-based synthesis of photo-realistic talking heads , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[19]  Bertil Lyberg,et al.  Visual Speech Synthesis With Concatenative Speech , 1998, AVSP.

[20]  David Salesin,et al.  Synthesizing realistic facial expressions from photographs , 1998, SIGGRAPH.

[21]  N. Michael Brooke,et al.  Two- and Three-Dimensional Audio-Visual Speech Synthesis , 1998, AVSP.

[22]  Frederic I. Parke,et al.  A parametric model for human faces. , 1974 .