论文信息 - Videorealistic talking faces: a morphing approach

Videorealistic talking faces: a morphing approach

We present a method for the construction of a videorealistic text-to-audiovisual speech synthesizer. A visual corpus of a subject enunciating a set of key words is initially recorded. The key words are chosen so that they collectively contain most of the American English viseme images, which are subsequently identified and extracted from the data by hand. Next, using optical flow methods borrowed from the computer vision literature, we compute realistic transitions between every viseme to every other viseme. The images along these transition paths are generated using a morphing method. Finally, we exploit phoneme and timing information extracted from a text-tospeech synthesizer to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a videorealistic talking face.

Tony Ezzat | Tomaso A. Poggio | T. Poggio | T. Ezzat

[1] Berthold K. P. Horn,et al. Determining Optical Flow , 1981, Other Conferences.

[2] Melvyn J. Hunt,et al. Issues in high quality LPC analysis and synthesis , 1989, EUROSPEECH.

[3] Thaddeus Beier,et al. Feature-based image metamorphosis , 1992, SIGGRAPH.

[4] Michael M. Cohen,et al. Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[5] John Coleman,et al. Acoustics of American English speech : a dynamic approach , 1993 .

[6] Tomaso Poggio,et al. Example Based Image Analysis and Synthesis , 1993 .

[7] John R. Wright,et al. Synthesis of Speaker Facial Movement to Match Selected Speech Sequences , 1994 .

[8] Tony Ezzat,et al. Example-based analysis and synthesis for images of human faces , 1996 .

[9] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[10] Paul Taylor,et al. Festival Speech Synthesis System , 1998 .