3D modeling and tracking of human lip motions

We address the problem of tracking and reconstructing 3D human lip motions from a 2D view. This problem is challenging due both to the complex nature of lip motions and the minimal data available from a raw video stream of the face. We counter both of these difficulties with statistical approaches. We first build a physically-based 3D model of lips and train it to cover only the subspace of lip motions. We then track this model in video by finding the shape within the subspace that maximizes the posterior probability of the model given the observed features. In this study, the features are the likelihoods of the lip and non-lip color classes: we iteratively derive forces from these values to apply to the physical model and converge to the final solution. Because of the full 3D nature of the model, this framework allows us to track the lips from any head pose. In addition, because of the constraints imposed by the learned subspace of the model, we are able to accurately estimate the full 3D lip shape from the 2D view.

[1]  Alex Pentland,et al.  A three-dimensional model of human lip motions trained from video , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[2]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  A. Adjoudani,et al.  On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .

[4]  Alex Pentland,et al.  Shape analysis of brain structures using physical and experimental modes , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Juergen Luettin,et al.  Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Alex Pentland,et al.  LAFTER: lips and face real time tracker , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Lorenzo Torresani,et al.  2D Deformable Models for Visual Speech Analysis , 1996 .

[8]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[9]  H. Saunders,et al.  Finite element procedures in engineering analysis , 1982 .

[10]  Irfan Essa,et al.  Analysis, interpretation and synthesis of facial expressions , 1995 .

[11]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[12]  Keith Waters,et al.  A coordinated muscle model for speech animation , 1995 .