Speech recognition for acoustic-assisted video coding and animation

In this paper, we discuss issues related to analysis and synthesis of facial images using speech information. An approach to speaker independent acoustic-assisted image coding and animation is studied. A perceptually based sliding window encoder is proposed. It utilizes the high rate (or oversampled) acoustic viseme sequence from the audio domain for image domain viseme interpolation and smoothing. The image domain visemes in our approach are dynamically constructed from a set of basic visemes. The look-ahead and look-back moving interpolations in the proposed approach provide an effective way to compensate the mismatch between auditory and visual perceptions.