Computer Graphics Animations of Talking Faces Based on Stochastic Models

Animated computer graphics displays of the visible speech gestures of the human face have a number of potential applications. This paper describes a novel method for their creation by bringing together two statistically-based techniques, namely, hidden Markov modelling and principal component analysis. The animations are derived from images of a real speaker’s face and incorporate all the visible features of the primary articulators, including the lips, teeth and tongue, in a graphical display which does not use an artificial facial model. A pilot ‘video speech synthesiser’ of this kind has been implemented and tested on spoken digit strings. fully rendered. facial graphics, including the display of perceptually significant areas such as the tongue, makes the latter more appropriate to the development of video speech synthesisers. However, this form of computer graphics animation involves large numbers of control points whose time-varying behaviour is difficult to define accurately or to derive from measurements of real speakers. The problem is compounded by a) the need to measure and model features like the teeth and tongue which can convey significant visual cues (McGrath et al., 1984) and b) the known variability and phonetic context sensitivity of visible speech gestures (Benguerel & Pichora-Fuller, 1982; Cohen & Massaro, in press).