Model-based global and local motion estimation for videoconference sequences

In this work, we present an algorithm for face 3-D motion estimation in videoconference sequences. The algorithm is able to estimate both the position of the face as an object in 3-D space (global motion) and the movements of portions of the face, like the mouth or the eyebrows ( local motion). The algorithm uses a modified version of the standard 3-D face model CANDIDE. We present various techniques to increase robustness of the global motion estimation which is based on feature tracking and an extended Kalman filter. Global motion estimation is used as a starting point for local motion detection in the mouth and eyebrow areas. To this purpose, synthetic images of these areas (templates) are generated with texture mapping techniques, and then compared to the corresponding regions in the current frame. A set of parameters, called action unit vectors (AUVs) influences the shape of the synthetic mouth and eyebrows. The optimal AUV values are determined via a gradient-based minimization procedure of the error energy between the templates and the actual face areas. The proposed scheme is robust and was tested with success on sequences of many hundreds of frames.

[1]  Jörgen Ahlberg,et al.  CANDIDE-3 - An Updated Parameterised Face , 2001 .

[2]  A. Pentland,et al.  Real time tracking and modeling of faces: an EKF-based analysis by synthesis approach , 1999, Proceedings IEEE International Workshop on Modelling People. MPeople'99.

[3]  Peter Eisert,et al.  Analyzing Facial Expressions for Virtual Conferencing , 1998, IEEE Computer Graphics and Applications.

[4]  Roberto Rinaldo,et al.  Three-dimensional motion estimation of objects for video coding , 1998, IEEE J. Sel. Areas Commun..

[5]  Aljoscha Smolic,et al.  Real-time estimation of long-term 3-D motion parameters for SNHC face animation and model-based coding applications , 1999, IEEE Trans. Circuits Syst. Video Technol..

[6]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  P. Perona,et al.  Motion estimation via dynamic vision , 1996, IEEE Trans. Autom. Control..

[8]  Markus Kampmann Automatic 3-D face model adaptation for model-based coding of videophone sequences , 2002, IEEE Trans. Circuits Syst. Video Technol..

[9]  Masahide Kaneko,et al.  Robust 3-D estimation of facial motion for model-based coding and human interface , 1997 .

[10]  Liang Zhang,et al.  Automatic adaptation of a face model using action units for semantic coding of videophone sequences , 1998, IEEE Trans. Circuits Syst. Video Technol..

[11]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Alan F. Murray,et al.  Reference block updating when tracking with block matching algorithm , 2000 .

[13]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[14]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.