Automatic synthesis of moving facial images with expression and mouth shape controlled by text

Recently a new type of coding method, called model-based image coding, has gained much attention as the basis for future visual services. One of the interesting and important application of the model-based coding technique is the media conversion. This can be used to mutually convert different media such as text, speech, and image, and also to synthesize moving facial images efficiently. This paper presents a new method to synthesize moving facial images with expression as well as mouth shape, both of which are controlled by input text. Input text consists of two parts. One contains control words describing the expression, motion of head and blink. The other contains speech text and control words for speech synthesis by rule. Features of mouth shapes and duration time for each phoneme are derived using the rules related to the results of speech synthesis. They are combined with the parameters for expression and movement of head to synthesize the final moving facial images having mouth shapes completely synchronized with synthetic speech. The advantage of the proposed method is that it makes possible to synthesize a natural facial image sequence easily and efficiently without tedious process for defining values of various kinds of parameters.