论文信息 - From speech to 3D face animation

From speech to 3D face animation

In this paper we present a new method to animate the face of a speaking avatar —i.e., a synthetic 3D human face— such that it realistically pronounces any given text, based on the audio only. Especially the lip movements must be rendered carefully, and perfectly synchronised with the audio, in order to have a realistic looking result, from which it should in principle be possible to understand the pronounced sentence by lip reading. Since such a system requires minimal bandwidth and relatively low computational effort, it could e.g. be used to transmit video conferencing data over a very low bandwidth channel, where the lip motion rendering is done at the receiving end, by only transmitting the audio channel, or in extremis even only an orthographic or phonetic transcription of the text together with precise phoneme timing information.

Luc Van Gool | Patrick Wambacq | Peter Vanroose | Gregor A. Kalberer

[1] Matthew Brand,et al. Voice puppetry , 1999, SIGGRAPH.

[2] A. Montgomery,et al. Physical characteristics of the lips underlying vowel lipreading performance. , 1983, The Journal of the Acoustical Society of America.

[3] E. Owens,et al. Visemes observed by hearing-impaired and normal-hearing adult viewers. , 1985, Journal of speech and hearing research.

[4] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[5] Patrick Wambacq,et al. An Improved Algorithm for the Automatic Segmentation of Speech Corpora , 2002, LREC.

[6] Biing-Hwang Juang,et al. Hidden Markov Models for Speech Recognition , 1991 .

[7] Tony Ezzat,et al. Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[8] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.