From speech to 3D face animation

In this paper we present a new method to animate the face of a speaking avatar —i.e., a synthetic 3D human face— such that it realistically pronounces any given text, based on the audio only. Especially the lip movements must be rendered carefully, and perfectly synchronised with the audio, in order to have a realistic looking result, from which it should in principle be possible to understand the pronounced sentence by lip reading. Since such a system requires minimal bandwidth and relatively low computational effort, it could e.g. be used to transmit video conferencing data over a very low bandwidth channel, where the lip motion rendering is done at the receiving end, by only transmitting the audio channel, or in extremis even only an orthographic or phonetic transcription of the text together with precise phoneme timing information.