Real-time head and facial motion analysis for model-based video coding

This dissertation introduces an algorithm for automatic head and facial feature tracking using a model-based approach. The input to the system is a 2D video sequence of a head-and-shoulders scene and the output is the trajectories of salient facial features, an estimate of the 3D motion of the head, and the corresponding Facial Action Parameters from MPEG4-SNHC. These Facial Action Parameters represent the facial motion in the video sequence and will be used to synthesize a 3D computer graphics head model to mimic the movement and expressions of the user. With respect to feature tracking, issues such as localization accuracy and error accumulation are overcome by using an underlying 3D model to compute optimal templates for each video frame for use in the feature-tracking module. A minimization procedure, combined with a Kalman filter, is used to process the computed 3D motion and to provide estimates of the pose for subsequent frames. The algorithm has been tested on synthetic and real sequences and is shown to produce accurate results for more than 100 frames at approximately 10 fps.